Published on 21/12/2025
ADaM Derivations You Can Defend: Versioning Discipline, Unit Tests That Catch Drift, and Rationale You Can Read in Court
Outcome-first ADaM: derivations that survive questions, re-cuts, and inspection sprints
What “defensible” means in practice
Defensible ADaM derivations are those that a new reviewer can trace, reproduce, and explain without calling the programmer. That requires three things: (1) explicit lineage from SDTM to analysis variables; (2) clear and versioned business rules tied to a SAP/estimand reference; and (3) automated unit tests that fail loudly when inputs, algorithms, or thresholds change. If any of these are missing, re-cuts become fragile and inspection time turns into archaeology.
State one compliance backbone—once
Anchor your analysis environment in a single, portable paragraph and reuse it across shells, SAP, standards, and CSR appendices: inspection expectations reference FDA BIMO; electronic records and signatures follow 21 CFR Part 11 and map to Annex 11; GCP oversight and roles align to ICH E6(R3); safety data exchange and narratives acknowledge ICH E2B(R3); public transparency aligns to ClinicalTrials.gov and EU postings under EU-CTR via CTIS; privacy follows HIPAA. Every change leaves a searchable audit trail; systemic issues route through CAPA; risk is tracked with QTLs and
Define the outcomes before you write a single line of code
Set three measurable outcomes for your derivation work: (1) Traceability—every analysis variable includes a one-line provenance token (domains, keys, and algorithms) and a link to a test; (2) Reproducibility—a saved parameter file and environment hash can recreate results byte-identically for the same cut; (3) Retrievability—a reviewer can open the derivation spec, program, and associated unit tests in under two clicks from a portfolio tile. If you can demonstrate all three on a stopwatch drill, you are inspection-ready.
Regulatory mapping: US-first clarity that ports cleanly to EU/UK review styles
US (FDA) angle—event → evidence in minutes
US assessors frequently select an analysis number and drill: where is the rule, what data feed it, what are the intercurrent-event assumptions, and how would the number change if a sensitivity rule applied? Your derivations must surface that story without a scavenger hunt. Titles, footnotes, and derivation notes should name the estimand, identify analysis sets, and point to Define.xml, ADRG, and the unit tests that guard the variable. When a reviewer asks “why is this value here?” you should be able to open the program, show the spec, run the test, and move on in minutes.
EU/UK (EMA/MHRA) angle—identical truths, different wrappers
EMA/MHRA reviewers ask the same questions but often emphasize estimand clarity, protocol deviation handling, and consistency with registry narratives. If US-first derivation notes use literal labels and your lineage is explicit, the same package translates with minimal edits. Keep a label cheat (“IRB → REC/HRA; IND safety alignment → regional CTA safety language”) in your programming standards so everyone speaks the same truth with local words.
| Dimension | US (FDA) | EU/UK (EMA/MHRA) |
|---|---|---|
| Electronic records | Part 11 validation & role attribution | Annex 11 controls; supplier qualification |
| Transparency | Consistency with registry wording | EU-CTR status via CTIS; UK registry alignment |
| Privacy | Minimum necessary & de-identification | GDPR/UK GDPR minimization/residency |
| Traceability set | Define.xml + ADRG/SDRG drill-through | Same, with emphasis on estimands clarity |
| Inspection lens | Event→evidence speed; unit test presence | Completeness & portability of rationale |
Process & evidence: a derivation spec that actually prevents rework
The eight-line derivation template that scales
Use a compact, mandatory block for each analysis variable: (1) Name/Label; (2) Purpose (link to SAP/estimand); (3) Source lineage (SDTM domains, keys); (4) Algorithm (pseudo-code with thresholds and tie-breakers); (5) Missingness (imputation, censoring); (6) Time windows (visits, allowable drift); (7) Sensitivity (alternative rules); (8) Unit tests (inputs/expected outputs). This short form makes rules readable and testable and keeps writers, statisticians, and programmers synchronized.
Make lineage explicit and mechanical
List SDTM domains and keys explicitly—e.g., AE (USUBJID, AESTDTC/AETERM) → ADAE (ADY, AESER, AESDTH). If derived across domains, depict the join logic (join keys, timing rules). Ambiguity here is the #1 cause of late-stage rework because different programmers resolve gaps differently. A one-line lineage token in the program header prevents drift.
- Enforce the eight-line derivation template in specs and program headers.
- Require lineage tokens for every analysis variable (domains, keys, algorithm ID).
- Map each rule to a SAP clause and estimand label (E9(R1) language).
- Declare windowing/visit rules and how partial dates are handled.
- Predefine sensitivity variants; don’t bolt them on later.
- Create unit tests per variable with named edge cases and expected values.
- Save parameters and environment hashes for reproducible reruns.
- Drill from portfolio tiles → shell/spec → code/tests → artifacts in two clicks.
- Version everything; tie changes to governance minutes and change summaries.
- File derivation specs, tests, and run logs to the TMF with cross-references.
Decision Matrix: choose derivation strategies that won’t unravel during review
| Scenario | Option | When to choose | Proof required | Risk if wrong |
|---|---|---|---|---|
| Baseline value missing or out-of-window | Pre-specified hunt rule (last non-missing pre-dose) | SAP allows single pre-dose window | Window spec; unit test with edge cases | Hidden imputation; inconsistent baselines |
| Multiple records per visit (duplicates/partials) | Tie-breaker chain (chronology → quality flag → mean) | When duplicates are common | Algorithm note; reproducible selection | Reviewer suspicion of cherry-picking |
| Time-to-event with heavy censoring | Explicit censoring rules + sensitivity | Dropout/administrative censoring high | Traceable lineage; ADTTE rules; tests | Bias claims; rerun churn late |
| Intercurrent events common (rescue, switch) | Treatment-policy primary + hypothetical sensitivity | E9(R1) estimand strategy declared | SAP excerpt; parallel shells | Estimand drift; mixed interpretations |
| Non-inferiority endpoint | Margin & scale stated in variable metadata | Primary or key secondary NI | Margin source; CI computation unit tests | Ambiguous claims; queries |
Document the “why” where reviewers will actually look
Maintain a Derivation Decision Log: question → option → rationale → artifacts (SAP clause, spec snippet, unit test ID) → owner → date → effectiveness (e.g., query reduction). File in Sponsor Quality and cross-link from the spec and code so the path from a number to a decision is obvious.
QC / Evidence Pack: the minimum, complete set that proves your derivations are under control
- Derivation specs (versioned) with lineage, rules, sensitivity, and unit tests referenced.
- Define.xml pointers and reviewer guides (ADRG/SDRG) aligned to variable metadata.
- Program headers with lineage tokens, change summaries, and run parameters.
- Automated unit test suite with coverage report and named edge cases.
- Environment lock files/hashes; rerun instructions that reproduce byte-identical results.
- Change-control minutes linking rule edits to SAP amendments and shells.
- Visual diffs of outputs pre/post change; threshold rules for acceptable drift.
- Portfolio drill-through maps (tiles → spec → code/tests → artifact locations).
- Governance minutes tying recurring defects to CAPA with effectiveness checks.
- TMF cross-references so inspectors can open everything without helpdesk tickets.
Vendor oversight & privacy
Qualify external programming teams against your standards; enforce least-privilege access; store interface logs and incident reports near the codebase. Where subject-level listings are tested, apply data minimization and de-identification consistent with privacy and jurisdictional rules.
Versioning discipline: prevent drift with simple, humane rules
Semantic versions plus change summaries
Use semantic versioning for specs and code (MAJOR.MINOR.PATCH). Every change must carry a top-of-file summary that states what changed, why (SAP clause/governance), and how to retest. Small cost now, huge savings later when a reviewer asks why Week 24 changed on a re-cut.
Freeze tokens and naming
Freeze dataset and variable names early. Late renames create invisible fractures across shells, CSR text, and validation macros. If you must rename, deprecate with an alias period and unit tests that fail if both appear simultaneously to avoid shadow variables.
Parameterize time and windows
Put time windows, censoring rules, and reference dates in a parameters file checked into version control. It prevents “magic numbers” in code and lets re-cuts use the right windows without manual edits. Unit tests should load parameters so a changed window forces test updates, not silent drift.
Unit tests that matter: what to test and how to keep tests ahead of change
Test the rules you argue about
Focus tests on edge cases that trigger debate: partial dates, overlapping visits, duplicate ids, ties in “first” events, and censoring at lock. Encode one or two examples per edge and assert exact expected values. When an algorithm changes, tests should fail where your conversation would have started anyway.
Golden records and minimal fixtures
Create tiny, named fixtures that cover each derivation pattern. Avoid giant “real” datasets that hide signal; use synthetic rows with clear intent. Keep golden outputs in version control; diffs show exactly what changed and why, and reviewers can read them like a storyboard.
Coverage that means something
Report code coverage but don’t chase 100%—chase rule coverage. Every business rule in your spec should have at least one test. Include failure-path tests that assert correct error messages when assumptions break (e.g., missing keys, illegal window values).
Templates reviewers appreciate: paste-ready tokens, footnotes, and rationale language
Spec tokens for fast comprehension
Purpose: “Supports estimand E1 (treatment policy) for primary endpoint.”
Lineage: “SDTM LB (USUBJID, LBDTC, LBTESTCD) → ADLB (ADT, AVISIT, AVAL).”
Algorithm: “Baseline = last non-missing pre-dose AVAL within [−7,0]; change = AVAL – baseline; if missing baseline, impute per SAP §[ref].”
Sensitivity: “Per-protocol window [−3,0]; tipping point ±[X] sensitivity.”
CSR-ready footnotes
“Baseline defined as the last non-missing, pre-dose value within the pre-specified window; if multiple candidate records exist, the earliest value within the window is used. Censoring rules are applied per SAP §[ref], with administrative censoring at database lock. Intercurrent events follow the treatment-policy strategy; a hypothetical sensitivity is provided in Table S[ref].”
Rationale sentences that quell queries
“The tie-breaker chain (chronology → quality flag → mean of remaining) minimizes bias when multiple records exist and reflects clinical practice where earlier, higher-quality measurements dominate. Sensitivity analyses demonstrate effect stability across window definitions.”
FAQs
How detailed should an ADaM derivation spec be?
Short and specific. Use an eight-line template covering purpose, lineage, algorithm, missingness, windows, sensitivity, and unit tests. The goal is that a reviewer can forecast the output’s behavior without reading code, and a programmer can implement without guessing.
Where should we store derivation rationale so inspectors can find it?
In three places: the spec (short form), the program header (summary and links), and the decision log (why this rule). Cross-link all three and file to the TMF. During inspection, open the decision log first to show intent, then the spec and code to show execution.
What makes a good unit test for ADaM variables?
Named edge cases with minimal fixtures and explicit expected values. Tests should assert both numeric results and the presence of required flags (e.g., imputation indicators). Include failure-path tests that prove the program rejects illegal inputs with clear messages.
How do we handle multiple registry or public narrative wordings?
Keep derivation text literal and map public wording via a label cheat sheet in your standards. If you change a public narrative, open a change control ticket and verify no estimand or analysis definitions drifted as a side effect.
How do we prevent variable name drift across deliverables?
Freeze names early, use aliases temporarily when renaming, and add tests that fail on simultaneous presence of old/new names. Update shells, CSR templates, and macros from a single dictionary to keep words and numbers synchronized.
What evidence convinces reviewers that our derivations are stable across re-cuts?
Byte-identical rebuilds for the same data cut, environment hashes, parameter files, and visual diffs of outputs pre/post change with thresholds. File stopwatch drills showing you can open spec, code, and tests in under two clicks and reproduce results on demand.
