Methodology
Every page on GLPwatch is structured data and direct quotes from official sources, linked back to the source. We never write long-form medical content or interpret study results.
Data sources & refresh cadence
| Source | Cadence | What we use it for |
|---|---|---|
| PubMed (E-utilities) | Weekly + daily new | Research papers by molecule MeSH terms. |
| NIH iCite | Daily | Citation counts, relative citation ratio, NIH percentile. |
| ClinicalTrials.gov API v2 | Daily | Trial status, phase, sponsor, conditions, outcomes. |
| openFDA | Daily (FAERS: quarterly, ~3-month lag) | Labels, adverse events, shortages, recalls, NDC, Drugs@FDA. |
| DailyMed | Daily | Current SPL labels and label-change dates. |
| Semantic Scholar | Daily new | Pre-built paper TLDR summaries, recommendations, citation graph. |
| SEC EDGAR | Quarterly | Manufacturer revenue, R&D spend, guidance. |
| FDA press & warning letters | Weekly | Enforcement actions, compounding-pharmacy letters. |
Where we use AI
AI (Mistral) is used only for narrow, structured tasks: tagging which conditions a paper covers, classifying on-label vs off-label use, producing short plain-English trial-protocol summaries, normalizing adverse-event terms, and phrasing one-line changelog entries. We use the small model by default and escalate to the larger model whenever the small model’s output fails an automated validity check, because trust comes first. Every AI output is cached and validated against our controlled vocabularies. We do not use AI to write drug or condition overviews, summarize papers (we use Semantic Scholar’s own TLDRs), or make any clinical claim.
Ranking
Leaderboards rank by transparent, source-derived metrics — citation count and citation velocity for papers, FAERS report counts for side effects, enrollment and status changes for trials. No editorial weighting is applied.
Known limitations
- FAERS reporting bias: adverse-event counts reflect voluntary reports, not incidence rates, and cannot establish causation. Higher counts often track prescription volume, not risk.
- FAERS lag: adverse-event data updates quarterly and trails real-world events by three or more months. We label it as latest-quarter data.
- Trial selection bias: ClinicalTrials.gov registrations are not a complete census, and registration does not imply results or quality.
- Citation metrics favor older papers: recent work is undercited by construction. We surface citation velocity where possible to compensate.
- US-first scope: data is primarily US-regulatory (openFDA, ClinicalTrials.gov). International sources are out of scope for v1.