Published on 21/12/2025
Using Real-World Data to Measure Vaccine Effectiveness (VE)
Why Real-World Data for VE—and What Regulators Expect
Randomized trials establish efficacy under controlled conditions; real-world data (RWD) tell us how vaccines perform across ages, comorbidities, variants, and care systems over months or years. Post-authorization, decision makers want to know: Does protection wane? Do boosters restore it? Which subgroups (e.g., adults ≥65 years, the immunocompromised) need earlier re-dosing? RWD—immunization registries, EHR/claims, laboratory systems, and vital records—lets us answer these questions at scale. But credibility hinges on methods and documentation: explicit protocols and SAPs; auditable data pipelines; bias diagnostics (propensity scores, negative controls); and transparency about laboratory performance and manufacturing quality context. When lab results define outcomes, include analytical capability—e.g., RT-PCR LOD 25 copies/mL and LOQ 50 copies/mL (illustrative), or ELISA IgG LOD 3 BAU/mL and LOQ 10 BAU/mL—so case adjudication is reproducible. To pre-empt “non-biological” confounders in reviewer discussions, keep a short appendix with representative PDE (e.g., 3 mg/day for a residual solvent) and cleaning MACO limits (e.g., 1.0–1.2 µg/25 cm²) demonstrating stable manufacturing hygiene.
Regulators also expect ALCOA (attributable, legible, contemporaneous, original, accurate) for data transformations and outputs, and computerized-system controls (21 CFR Part 11 and EU Annex
Core VE Designs with RWD: Cohort, Test-Negative, and Case-Control
Cohort designs. Follow vaccinated and comparator groups over time using Cox or Poisson models. Represent time since vaccination (TSV) via restricted cubic splines or pre-specified intervals (0–3, 3–6, 6–9, 9–12 months). Estimate hazard ratios (HR) or incidence-rate ratios (IRR) and convert to VE = (1−HR)×100% or (1−IRR)×100%. Adjust for calendar time, geography, and variant periods; include prior infection and booster status as time-varying covariates. Example (dummy): Adjusted HR for hospitalization 0.35 at 0–3 months → VE 65%; 0.58 at 6–9 months → VE 42%.
Test-Negative Design (TND). Restrict to symptomatic testers; cases are test-positives, controls test-negatives. TND reduces healthcare-seeking bias but assumes similar exposure/testing propensities. Always stratify by symptom criteria and testing policy periods, and run falsification checks (e.g., pre-rollout “VE” ≈ 0%).
Case-control. Useful for rare outcomes (ICU, death). Sample controls densely in time (risk-set sampling) and match on age, sex, geography, and calendar time; analyze with conditional logistic regression. Whatever the design, pre-declare subgroup analyses (≥65, immunocompromised), outcome tiers (ED visit, hospitalization, ICU, death), and decision thresholds that trigger communications or label updates.
| Goal | Best Fit | Strength | Watch-outs |
|---|---|---|---|
| Waning over time | Cohort | TSV modeling, boosters | Immortal time bias |
| Respiratory VE | TND | Seeks testing parity | Policy shifts bias |
| Severe outcomes | Case-control | Efficiency for rare events | Control selection |
Data Linkage & Quality: Turning Heterogeneous Sources into Analysis-Ready Sets
VE lives or dies on linkage. Combine immunization registries (dose dates, products, lots) with EHR/claims (encounters, comorbidities), laboratories (PCR/antigen/serology), and vital statistics (deaths). Use privacy-preserving linkage (hashing, third-party matching) and log deterministic/probabilistic match keys. Build an ETL with validation gates: impossible intervals (dose 2 before dose 1), duplicate vaccinations, outcome-date sanity checks, and cross-source concordance (admit/discharge vs diagnosis timestamps). Version-lock code and containerize (e.g., Docker) so re-runs reproduce hashes. Maintain a data dictionary and MedDRA/ICD-10 mapping under change control. Archive raw snapshots with checksums to satisfy ALCOA’s “original.”
Outcome adjudication must be explicit. Define laboratory thresholds and specimen rules (e.g., accept PCR Ct ≤ 35; resolve discordant antigen/PCR with repeat testing). If using biomarkers in severity tiers, declare the assay performance in the SAP: potency or infection assays with LOD/LOQ values. Keep a short “quality context” memo in the TMF with representative PDE and MACO examples to document that manufacturing and cleaning controls stayed in-spec while clinical effectiveness varied.
Governance, KPIs, and Decision Rules
Stand up a monthly Safety/Effectiveness Board to review dashboards and decide actions. Pre-define KPIs: cohort coverage (% registry-linked to EHR), lag from data cut to dashboard, capture of prior infection, VE at key TSV intervals, and subgroup VE. Quality KPIs include ETL error rate, linkage success, audit-trail review completion, and reproducibility checks (code hash). Establish decision rules such as: “If hospitalization VE in ≥65 years drops >10 points over a quarter with overlapping variant periods and no quality confounder, then recommend booster timing update and prepare HCP comms.” File minutes and decisions with supporting outputs in the TMF.
For hands-on SOP templates covering protocols, ETL validation, and inspection-ready reports, see pharmaValidation.in. Public terminology for post-authorization evidence can be cross-checked on the EMA website.
Modeling Waning & Boosters: Time-Since-Vaccination Done Right
Waning is not a single slope—it varies by age, risk, variant, and outcome. Treat time since vaccination (TSV) as a primary exposure. In Cox models, use restricted cubic splines (3–5 knots) or stepped intervals (0–3, 3–6, 6–9, 9–12 months). Interact TSV with age bands and immunocompromised status. For boosters, apply a biologically plausible grace period (e.g., 7–14 days post-booster) and model booster status as a time-varying covariate. Adjust for calendar time via strata or splines to absorb variant waves and policy changes; include prior infection as a time-varying variable. Report absolute risks (per 100,000 person-months) alongside VE to support policy decisions.
| Interval | Adjusted HR | VE (1−HR) | 95% CI |
|---|---|---|---|
| 0–3 mo (primary) | 0.32 | 68% | 64–71% |
| 3–6 mo (primary) | 0.48 | 52% | 47–56% |
| 6–9 mo (primary) | 0.64 | 36% | 30–42% |
| 0–3 mo (booster) | 0.28 | 72% | 68–75% |
| 3–6 mo (booster) | 0.40 | 60% | 55–64% |
Bias control. Guard against immortal-time bias by aligning person-time precisely around dose dates and grace periods. Use propensity-score weighting/matching with calendar-time strata and geography to reduce confounding by indication. Deploy negative control outcomes (e.g., ankle sprain) and exposures (future vaccination date) to detect residual bias. In TND, vary symptom definitions and exclude occupational screens to test robustness. Where outcomes depend on assays, keep method transparency visible—e.g., RT-PCR LOD 25 copies/mL; LOQ 50 copies/mL—and preserve chain-of-custody. Tie everything back to ALCOA: version-locked code, timestamped cuts, and immutable raw snapshots.
Case Study (Hypothetical): A National VE Program that Drove a Booster Decision
Context. A country links registries, EHR, labs, and vital stats for 2.5 M adults. Findings (dummy). Hospitalization VE in ≥65 years: 68% at 0–3 months post-primary, 52% at 3–6 months, 36% at 6–9 months. Booster lowers HR to 0.28 (VE 72%) in months 0–3 post-booster, stabilizing at VE 60% by months 3–6. TND sensitivity analyses show VE within ±3 points; cohort and case-control designs converge on similar estimates. Negative controls are null; falsification in pre-rollout months ≈0% VE. Labs document analytical capability; adjudication rules are transparent. Quality appendix shows representative PDE 3 mg/day and MACO 1.0–1.2 µg/25 cm²; no manufacturing or cold-chain anomalies are linked to outcome spikes.
Action. The board applies pre-declared rules: “>10-point drop in ≥65s over a quarter with consistent bias checks → recommend booster at 6 months.” HCP materials are updated; an eCTD supplement compiles protocol/SAP, dashboards, and a reproducibility package (container hash, code, parameter files). Public comms explain denominators, absolute risks, and limits. The system continues monthly, ready to detect further waning or variant-specific changes.
Deliverables & Inspection Readiness: Make ALCOA Obvious
Create a simple crosswalk in the TMF: SOP → data cuts → code → outputs → decisions → labels/comms. For each cycle, file (1) protocol/SAP (and addenda), (2) data-cut memo (sources, versions, date), (3) analysis report with TSV curves and subgroup tables, (4) bias diagnostics (balance plots, negative controls), (5) reproducibility pack (code, containers, hashes), and (6) board minutes with decisions. Keep one internal link handy for your teams’ SOPs and validation templates—practitioners often adapt patterns from PharmaSOP.in—and cite a single external reference for public expectations; the ICH Quality Guidelines page is a concise touchstone to align vocabulary on validation and data integrity across functions.
