Published on 21/12/2025
Complete Case or Full Dataset? Choosing the Right Analysis Approach for Missing Data
Handling missing data is a critical decision in clinical trial analysis. Two commonly considered approaches are Complete Case Analysis (CCA) and Full Dataset Modeling (e.g., MMRM or Multiple Imputation). Choosing between them requires understanding the underlying assumptions, data structure, regulatory expectations, and impact on validity.
This guide explores when it is appropriate to use complete case analysis versus full dataset methods in biostatistical evaluations. We’ll also discuss the regulatory context from agencies like the USFDA and EMA, and offer practical recommendations to guide your decision-making process.
Understanding Complete Case Analysis (CCA)
Complete Case Analysis involves analyzing only those subjects for whom all relevant data are available. Any patient with missing data on the outcome or a key covariate is excluded from the analysis.
Advantages of CCA:
- Simple to implement and interpret
- Works with standard statistical tools
- No modeling assumptions about the missing data
Limitations of CCA:
- Leads to loss of sample size and statistical power
- Results may be biased if data are not Missing Completely at Random (MCAR)
- Cannot be used when missingness is high or systematic
When to Use CCA:
- When
CCA may be acceptable under specific circumstances, but its limitations must be clearly stated in the trial documentation.
Understanding Full Dataset Analysis
Full Dataset Analysis refers to techniques that incorporate all available data, including cases with partial information. Examples include:
- MMRM (Mixed Models for Repeated Measures): Accommodates MAR (Missing at Random) data
- Multiple Imputation: Uses observed data to predict and fill in missing values
- Maximum Likelihood Estimation: Accounts for partial data without explicit imputation
Advantages of Full Dataset Methods:
- Preserves statistical power by using all available information
- Yields unbiased estimates under MAR assumptions
- Widely accepted by regulatory agencies
Limitations:
- Requires correct specification of the model
- May be computationally intensive
- Assumptions (like MAR) must be justified
These methods are favored in regulatory reviews, especially for primary endpoints. Their inclusion in the Statistical Analysis Plan reflects best practice in handling missing data.
Regulatory Guidance: CCA vs Full Dataset
Regulators discourage CCA as a primary analysis method unless MCAR can be assumed and justified. For pivotal trials, agencies like the FDA and EMA recommend full dataset approaches with appropriate sensitivity analyses.
Key Guidelines:
- FDA Guidance on Missing Data (2010): Emphasizes pre-specification and avoidance of CCA
- ICH E9(R1): Introduces estimands that define the role of intercurrent events like dropout
- EMA Guideline on Missing Data: Encourages model-based analyses with sensitivity checks
Documentation of methods and justification of assumptions is critical for regulatory compliance.
Practical Comparison: When to Choose What
| Scenario | Preferred Method | Rationale |
|---|---|---|
| <5% missing data, MCAR confirmed | Complete Case Analysis | Minimal bias risk, simple approach |
| Dropout related to observed variables | MMRM or MI (Full Dataset) | MAR assumption holds |
| High dropout (>15%) | Full Dataset + Sensitivity Analysis | Need to preserve power and explore MNAR |
| Regulatory submission | Full Dataset (Primary) + CCA (Supportive) | To demonstrate robustness |
Best Practices for Implementation
- Include both CCA and full dataset methods in SAP as primary and supportive analyses
- Clearly define assumptions about missing data mechanisms
- Perform and report sensitivity analyses (e.g., tipping point, delta adjustment)
- Use statistical software with validated imputation modules
- Document rationale and results per SOPs and in the CSR
Conclusion
The decision to use complete case analysis or full dataset modeling should be driven by data characteristics, missingness mechanisms, and regulatory requirements. While CCA is easy to apply, it is limited to rare MCAR situations and should only be used as supportive analysis. Full dataset approaches like MMRM and multiple imputation offer robust solutions under MAR and are preferred in regulatory submissions. Incorporating both strategies—alongside transparent assumptions and sensitivity analyses—ensures your trial results remain valid and defensible.
