Methodology

Every page on GLPwatch is structured data and direct quotes from official sources, linked back to the source. We never write long-form medical content or interpret study results.

Data sources & refresh cadence

Source	Cadence	What we use it for
PubMed (E-utilities)	Weekly + daily new	Research papers by molecule MeSH terms.
NIH iCite	Daily	Citation counts, relative citation ratio, NIH percentile.
ClinicalTrials.gov API v2	Daily	Trial status, phase, sponsor, conditions, outcomes.
openFDA	Daily (FAERS: quarterly, ~3-month lag)	Labels, adverse events, shortages, recalls, NDC, Drugs@FDA.
DailyMed	Daily	Current SPL labels and label-change dates.
Semantic Scholar	Daily new	Pre-built paper TLDR summaries, recommendations, citation graph.
SEC EDGAR	Quarterly	Manufacturer revenue, R&D spend, guidance.
FDA press & warning letters	Weekly	Enforcement actions, compounding-pharmacy letters.

Where we use AI

AI (Mistral) is used only for narrow, structured tasks: tagging which conditions a paper covers, classifying on-label vs off-label use, producing short plain-English trial-protocol summaries, normalizing adverse-event terms, and phrasing one-line changelog entries. We use the small model by default and escalate to the larger model whenever the small model’s output fails an automated validity check, because trust comes first. Every AI output is cached and validated against our controlled vocabularies. We do not use AI to write drug or condition overviews, summarize papers (we use Semantic Scholar’s own TLDRs), or make any clinical claim.

Ranking

Leaderboards rank by transparent, source-derived metrics — citation count and citation velocity for papers, FAERS report counts for side effects, enrollment and status changes for trials. No editorial weighting is applied.

Known limitations

FAERS reporting bias: adverse-event counts reflect voluntary reports, not incidence rates, and cannot establish causation. Higher counts often track prescription volume, not risk.
FAERS lag: adverse-event data updates quarterly and trails real-world events by three or more months. We label it as latest-quarter data.
Trial selection bias: ClinicalTrials.gov registrations are not a complete census, and registration does not imply results or quality.
Citation metrics favor older papers: recent work is undercited by construction. We surface citation velocity where possible to compensate.
US-first scope: data is primarily US-regulatory (openFDA, ClinicalTrials.gov). International sources are out of scope for v1.