Published on 09/01/2026
How Regulatory Bodies Accept EHR-Derived Data in Pharma Studies
Electronic Health Records (EHRs) are increasingly used as real-world data (RWD) sources for generating real-world evidence (RWE) in pharmaceutical research. However, not all EHR-derived data is considered fit-for-purpose by global regulatory agencies such as the EMA and the USFDA. To gain regulatory acceptance, EHR-based data must meet strict criteria for quality, traceability, reliability, and relevance.
This tutorial outlines how pharma professionals can ensure EHR-derived data complies with regulatory expectations, what documentation to prepare, and which standards to follow when planning submissions using RWE generated from electronic medical records.
Understanding Regulatory Expectations for EHR-Derived Data:
Agencies such as the FDA and EMA are open to the use of EHR data, provided the following criteria are met:
- Data Integrity: The source data must be complete, accurate, and unaltered.
- Traceability: Each data point must be traceable to its origin, including who entered it and when.
- Relevance: Data must be appropriate for the clinical question or regulatory decision.
- Transparency: Clear documentation of data provenance and transformation is required.
- Governance: Use of the EHR system must be under formal oversight with defined policies.
Regulatory bodies apply similar scrutiny to EHR-derived data
Step 1: Ensure EHR System Validity and Compliance
Only validated, regulated EHR systems should be used for data generation. Key checks include:
- 21 CFR Part 11 compliance for electronic records and signatures
- Audit trails that show who accessed or changed data
- System qualification and change control documentation
- Role-based access with permission logs
Systems that generate the data should undergo formal process validation and adhere to ALCOA+ principles (Attributable, Legible, Contemporaneous, Original, Accurate).
Step 2: Data Source Mapping and Documentation
Agencies expect thorough documentation of where data comes from. Your submission must include:
- List of all data fields used and their clinical significance
- Definitions of each variable (e.g., diagnosis codes, lab values)
- Data transformation or derivation logic applied
- Version control for datasets and extraction protocols
It’s also important to describe any limitations in data capture, such as missing values or inconsistent time intervals.
Step 3: Validate Data Quality and Consistency
Before submitting RWE derived from EHRs, conduct quality checks such as:
- Duplicate entry analysis
- Outlier detection (e.g., unrealistic blood pressure readings)
- Range and consistency checks
- Missing data imputation justifications
Agencies often require submission of the data cleaning steps, query logs, and issue resolution summaries. These are typically maintained under GMP documentation requirements.
Step 4: Clarify Patient Selection and Data Linkage Methodology
Patient population definitions must be precise and reproducible. Regulatory reviewers need to know:
- Inclusion and exclusion criteria for the dataset
- ICD/CPT/LOINC codes used for identifying conditions or procedures
- Data linkage rules if combining EHR with claims or registry data
- Patient privacy safeguards, such as de-identification SOPs
Be transparent if linkage required deterministic or probabilistic methods, and provide match accuracy rates.
Step 5: Align with Relevant Regulatory Frameworks
Each regulatory body provides guidance documents for RWD use:
- FDA: Framework for RWE program, 2018; Draft guidance on RWD use in submissions
- EMA: RWE Reflection Paper; Big Data Task Force Recommendations
- Health Canada: Guidance on RWD/RWE submissions
- CDSCO: Emerging interest in RWE for post-marketing studies in India
In all cases, align your submission to the specific regulatory definitions of fitness-for-purpose data.
Step 6: Use Standardized Data Models Where Possible
Adopt harmonized structures such as:
- OMOP CDM: Observational Medical Outcomes Partnership Common Data Model
- HL7 FHIR: Fast Healthcare Interoperability Resources
- Sentinel Data Model: Used by FDA for safety surveillance
These models improve traceability, transparency, and cross-system comparison. They are encouraged for studies submitted as RWE.
Step 7: Address Statistical and Methodological Rigor
Include a clear statistical analysis plan (SAP) that addresses:
- Confounding and bias mitigation strategies
- Propensity score matching or weighting techniques
- Sensitivity analyses for missing or ambiguous data
- Endpoint definitions using standardized clinical logic
Justify your choice of real-world comparators or external controls. Regulatory bodies evaluate RWE with the same rigor as RCTs in many cases.
Step 8: Submit RWE as Part of Regulatory Filing with Transparent Appendices
Whether used in a New Drug Application (NDA), Marketing Authorization Application (MAA), or post-marketing commitment, EHR-derived data must be submitted in a transparent, structured format:
- Include all data transformation protocols
- Provide audit logs and dataset lineage
- Append SAS or R scripts used for analysis
- Submit de-identified patient-level data as applicable
Consider publishing protocols and methods to boost reviewer confidence and transparency.
Conclusion: Charting a Path to Regulatory Acceptance
As regulators grow more open to EHR-derived RWE, pharmaceutical companies must meet heightened expectations for data quality, transparency, and methodological soundness. Follow the guidance outlined above to ensure your EHR-based study data is not just real-world, but real-useful for regulators.
Whether analyzing treatment persistence, adverse event patterns, or comparative effectiveness, EHR-derived RWE can accelerate access to therapies and post-market insights—provided it’s regulatory-grade.
For studies involving drug degradation patterns or treatment timelines, integrate datasets from StabilityStudies.in for enhanced outcome prediction in EHR-based research.
