CDISC compliance – Clinical Research Made Simple

ADaM Derivations You Can Defend: Versioning, Unit Tests, Rationale

digi — Wed, 05 Nov 2025 00:05:09 +0000

ADaM Derivations You Can Defend: Versioning, Unit Tests, Rationale

ADaM Derivations You Can Defend: Versioning Discipline, Unit Tests That Catch Drift, and Rationale You Can Read in Court

Outcome-first ADaM: derivations that survive questions, re-cuts, and inspection sprints

What “defensible” means in practice

Defensible ADaM derivations are those that a new reviewer can trace, reproduce, and explain without calling the programmer. That requires three things: (1) explicit lineage from SDTM to analysis variables; (2) clear and versioned business rules tied to a SAP/estimand reference; and (3) automated unit tests that fail loudly when inputs, algorithms, or thresholds change. If any of these are missing, re-cuts become fragile and inspection time turns into archaeology.

State one compliance backbone—once

Anchor your analysis environment in a single, portable paragraph and reuse it across shells, SAP, standards, and CSR appendices: inspection expectations reference FDA BIMO; electronic records and signatures follow 21 CFR Part 11 and map to Annex 11; GCP oversight and roles align to ICH E6(R3); safety data exchange and narratives acknowledge ICH E2B(R3); public transparency aligns to ClinicalTrials.gov and EU postings under EU-CTR via CTIS; privacy follows HIPAA. Every change leaves a searchable audit trail; systemic issues route through CAPA; risk is tracked with QTLs and managed via RBM. Patient-reported and remote elements feed validated eCOA pipelines, including decentralized workflows (DCT). All artifacts are filed to the TMF/eTMF. Standards use CDISC conventions with lineage from SDTM to ADaM, and statistical claims avoid ambiguity in non-inferiority or superiority contexts. Anchor this stance one time with compact authority links—FDA, EMA, MHRA, ICH, WHO, PMDA, and TGA—and then get back to derivations.

Define the outcomes before you write a single line of code

Set three measurable outcomes for your derivation work: (1) Traceability—every analysis variable includes a one-line provenance token (domains, keys, and algorithms) and a link to a test; (2) Reproducibility—a saved parameter file and environment hash can recreate results byte-identically for the same cut; (3) Retrievability—a reviewer can open the derivation spec, program, and associated unit tests in under two clicks from a portfolio tile. If you can demonstrate all three on a stopwatch drill, you are inspection-ready.

Regulatory mapping: US-first clarity that ports cleanly to EU/UK review styles

US (FDA) angle—event → evidence in minutes

US assessors frequently select an analysis number and drill: where is the rule, what data feed it, what are the intercurrent-event assumptions, and how would the number change if a sensitivity rule applied? Your derivations must surface that story without a scavenger hunt. Titles, footnotes, and derivation notes should name the estimand, identify analysis sets, and point to Define.xml, ADRG, and the unit tests that guard the variable. When a reviewer asks “why is this value here?” you should be able to open the program, show the spec, run the test, and move on in minutes.

EU/UK (EMA/MHRA) angle—identical truths, different wrappers

EMA/MHRA reviewers ask the same questions but often emphasize estimand clarity, protocol deviation handling, and consistency with registry narratives. If US-first derivation notes use literal labels and your lineage is explicit, the same package translates with minimal edits. Keep a label cheat (“IRB → REC/HRA; IND safety alignment → regional CTA safety language”) in your programming standards so everyone speaks the same truth with local words.

Dimension	US (FDA)	EU/UK (EMA/MHRA)
Electronic records	Part 11 validation & role attribution	Annex 11 controls; supplier qualification
Transparency	Consistency with registry wording	EU-CTR status via CTIS; UK registry alignment
Privacy	Minimum necessary & de-identification	GDPR/UK GDPR minimization/residency
Traceability set	Define.xml + ADRG/SDRG drill-through	Same, with emphasis on estimands clarity
Inspection lens	Event→evidence speed; unit test presence	Completeness & portability of rationale

Process & evidence: a derivation spec that actually prevents rework

The eight-line derivation template that scales

Use a compact, mandatory block for each analysis variable: (1) Name/Label; (2) Purpose (link to SAP/estimand); (3) Source lineage (SDTM domains, keys); (4) Algorithm (pseudo-code with thresholds and tie-breakers); (5) Missingness (imputation, censoring); (6) Time windows (visits, allowable drift); (7) Sensitivity (alternative rules); (8) Unit tests (inputs/expected outputs). This short form makes rules readable and testable and keeps writers, statisticians, and programmers synchronized.

Make lineage explicit and mechanical

List SDTM domains and keys explicitly—e.g., AE (USUBJID, AESTDTC/AETERM) → ADAE (ADY, AESER, AESDTH). If derived across domains, depict the join logic (join keys, timing rules). Ambiguity here is the #1 cause of late-stage rework because different programmers resolve gaps differently. A one-line lineage token in the program header prevents drift.

Enforce the eight-line derivation template in specs and program headers.
Require lineage tokens for every analysis variable (domains, keys, algorithm ID).
Map each rule to a SAP clause and estimand label (E9(R1) language).
Declare windowing/visit rules and how partial dates are handled.
Predefine sensitivity variants; don’t bolt them on later.
Create unit tests per variable with named edge cases and expected values.
Save parameters and environment hashes for reproducible reruns.
Drill from portfolio tiles → shell/spec → code/tests → artifacts in two clicks.
Version everything; tie changes to governance minutes and change summaries.
File derivation specs, tests, and run logs to the TMF with cross-references.

Decision Matrix: choose derivation strategies that won’t unravel during review

Scenario	Option	When to choose	Proof required	Risk if wrong
Baseline value missing or out-of-window	Pre-specified hunt rule (last non-missing pre-dose)	SAP allows single pre-dose window	Window spec; unit test with edge cases	Hidden imputation; inconsistent baselines
Multiple records per visit (duplicates/partials)	Tie-breaker chain (chronology → quality flag → mean)	When duplicates are common	Algorithm note; reproducible selection	Reviewer suspicion of cherry-picking
Time-to-event with heavy censoring	Explicit censoring rules + sensitivity	Dropout/administrative censoring high	Traceable lineage; ADTTE rules; tests	Bias claims; rerun churn late
Intercurrent events common (rescue, switch)	Treatment-policy primary + hypothetical sensitivity	E9(R1) estimand strategy declared	SAP excerpt; parallel shells	Estimand drift; mixed interpretations
Non-inferiority endpoint	Margin & scale stated in variable metadata	Primary or key secondary NI	Margin source; CI computation unit tests	Ambiguous claims; queries

Document the “why” where reviewers will actually look

Maintain a Derivation Decision Log: question → option → rationale → artifacts (SAP clause, spec snippet, unit test ID) → owner → date → effectiveness (e.g., query reduction). File in Sponsor Quality and cross-link from the spec and code so the path from a number to a decision is obvious.

QC / Evidence Pack: the minimum, complete set that proves your derivations are under control

Derivation specs (versioned) with lineage, rules, sensitivity, and unit tests referenced.
Define.xml pointers and reviewer guides (ADRG/SDRG) aligned to variable metadata.
Program headers with lineage tokens, change summaries, and run parameters.
Automated unit test suite with coverage report and named edge cases.
Environment lock files/hashes; rerun instructions that reproduce byte-identical results.
Change-control minutes linking rule edits to SAP amendments and shells.
Visual diffs of outputs pre/post change; threshold rules for acceptable drift.
Portfolio drill-through maps (tiles → spec → code/tests → artifact locations).
Governance minutes tying recurring defects to CAPA with effectiveness checks.
TMF cross-references so inspectors can open everything without helpdesk tickets.

Vendor oversight & privacy

Qualify external programming teams against your standards; enforce least-privilege access; store interface logs and incident reports near the codebase. Where subject-level listings are tested, apply data minimization and de-identification consistent with privacy and jurisdictional rules.

Versioning discipline: prevent drift with simple, humane rules

Semantic versions plus change summaries

Use semantic versioning for specs and code (MAJOR.MINOR.PATCH). Every change must carry a top-of-file summary that states what changed, why (SAP clause/governance), and how to retest. Small cost now, huge savings later when a reviewer asks why Week 24 changed on a re-cut.

Freeze tokens and naming

Freeze dataset and variable names early. Late renames create invisible fractures across shells, CSR text, and validation macros. If you must rename, deprecate with an alias period and unit tests that fail if both appear simultaneously to avoid shadow variables.

Parameterize time and windows

Put time windows, censoring rules, and reference dates in a parameters file checked into version control. It prevents “magic numbers” in code and lets re-cuts use the right windows without manual edits. Unit tests should load parameters so a changed window forces test updates, not silent drift.

Unit tests that matter: what to test and how to keep tests ahead of change

Test the rules you argue about

Focus tests on edge cases that trigger debate: partial dates, overlapping visits, duplicate ids, ties in “first” events, and censoring at lock. Encode one or two examples per edge and assert exact expected values. When an algorithm changes, tests should fail where your conversation would have started anyway.

Golden records and minimal fixtures

Create tiny, named fixtures that cover each derivation pattern. Avoid giant “real” datasets that hide signal; use synthetic rows with clear intent. Keep golden outputs in version control; diffs show exactly what changed and why, and reviewers can read them like a storyboard.

Coverage that means something

Report code coverage but don’t chase 100%—chase rule coverage. Every business rule in your spec should have at least one test. Include failure-path tests that assert correct error messages when assumptions break (e.g., missing keys, illegal window values).

Templates reviewers appreciate: paste-ready tokens, footnotes, and rationale language

Spec tokens for fast comprehension

Purpose: “Supports estimand E1 (treatment policy) for primary endpoint.”
Lineage: “SDTM LB (USUBJID, LBDTC, LBTESTCD) → ADLB (ADT, AVISIT, AVAL).”
Algorithm: “Baseline = last non-missing pre-dose AVAL within [−7,0]; change = AVAL – baseline; if missing baseline, impute per SAP §[ref].”
Sensitivity: “Per-protocol window [−3,0]; tipping point ±[X] sensitivity.”

CSR-ready footnotes

“Baseline defined as the last non-missing, pre-dose value within the pre-specified window; if multiple candidate records exist, the earliest value within the window is used. Censoring rules are applied per SAP §[ref], with administrative censoring at database lock. Intercurrent events follow the treatment-policy strategy; a hypothetical sensitivity is provided in Table S[ref].”

Rationale sentences that quell queries

“The tie-breaker chain (chronology → quality flag → mean of remaining) minimizes bias when multiple records exist and reflects clinical practice where earlier, higher-quality measurements dominate. Sensitivity analyses demonstrate effect stability across window definitions.”

FAQs

How detailed should an ADaM derivation spec be?

Short and specific. Use an eight-line template covering purpose, lineage, algorithm, missingness, windows, sensitivity, and unit tests. The goal is that a reviewer can forecast the output’s behavior without reading code, and a programmer can implement without guessing.

Where should we store derivation rationale so inspectors can find it?

In three places: the spec (short form), the program header (summary and links), and the decision log (why this rule). Cross-link all three and file to the TMF. During inspection, open the decision log first to show intent, then the spec and code to show execution.

What makes a good unit test for ADaM variables?

Named edge cases with minimal fixtures and explicit expected values. Tests should assert both numeric results and the presence of required flags (e.g., imputation indicators). Include failure-path tests that prove the program rejects illegal inputs with clear messages.

How do we handle multiple registry or public narrative wordings?

Keep derivation text literal and map public wording via a label cheat sheet in your standards. If you change a public narrative, open a change control ticket and verify no estimand or analysis definitions drifted as a side effect.

How do we prevent variable name drift across deliverables?

Freeze names early, use aliases temporarily when renaming, and add tests that fail on simultaneous presence of old/new names. Update shells, CSR templates, and macros from a single dictionary to keep words and numbers synchronized.

What evidence convinces reviewers that our derivations are stable across re-cuts?

Byte-identical rebuilds for the same data cut, environment hashes, parameter files, and visual diffs of outputs pre/post change with thresholds. File stopwatch drills showing you can open spec, code, and tests in under two clicks and reproduce results on demand.

Top Repositories for Clinical Trial Data Sharing

digi — Mon, 25 Aug 2025 08:17:10 +0000

Top Repositories for Clinical Trial Data Sharing

Best Platforms for Sharing Clinical Trial Data Responsibly and Transparently

Introduction: Why Repository Selection Matters

As open data becomes a regulatory and ethical expectation in clinical research, selecting the right data repository is critical. A good repository ensures data security, metadata integrity, ease of access for researchers, and compliance with global transparency mandates. With numerous platforms available, sponsors and researchers must understand which repositories align with their data type, jurisdiction, and privacy standards.

This tutorial reviews the top global repositories used to share clinical trial data, highlighting features, regulatory alignment, and use cases. The right choice not only fulfills obligations but enhances the visibility, utility, and impact of trial results.

Types of Clinical Trial Repositories

Clinical trial data can be deposited in several types of repositories:

Regulatory Registries: Required by authorities (e.g., ClinicalTrials.gov, EU CTR)
Open Data Platforms: Allow public access (e.g., Dryad, Figshare)
Controlled-Access Repositories: Require request and approval (e.g., Vivli, YODA)
Sponsor-Owned Portals: Managed by pharmaceutical companies or CROs

Each category serves different access levels and privacy safeguards, and often a combination is used for broad compliance and discoverability.

Repository Comparison Table

Repository	Access Level	Target Users	Data Types Accepted	Global Recognition
ClinicalTrials.gov	Open	Public, researchers	Registry info, summary results	Yes
Vivli	Controlled	Qualified researchers	Patient-level data, protocols	Yes
YODA Project	Controlled	Researchers (peer-reviewed)	De-identified participant data	Yes
Dryad	Open	General public	Datasets, metadata, tables	Yes
EU Clinical Trials Register	Open	Public	Trial summaries, protocols	Yes

1. ClinicalTrials.gov – The Primary US Registry

Operated by the U.S. National Library of Medicine, ClinicalTrials.gov is a mandatory repository for most interventional studies conducted under FDA jurisdiction. It includes trial registration, summary results, and outcome measures.

Key Features:

Accepts summary results in tabular format
Structured data entry via PRS (Protocol Registration System)
Used to assess compliance under FDAAA 801
Global visibility and indexing

Explore ClinicalTrials.gov

2. Vivli – A Global Controlled-Access Platform

Vivli.org is a nonprofit data sharing platform that hosts individual participant-level data (IPD) and supports cross-sponsor collaboration. It enables researchers to access de-identified datasets following a formal proposal and approval process.

Highlights:

Secure cloud-based environment for data access
Used by industry sponsors, academia, and funders
Supports metadata linkage with DOIs and publications
Supports compliance with EMA Policy 0070 and ICMJE

Vivli promotes transparency while protecting participant confidentiality through strict governance models.

3. YODA Project – Yale Open Data Access

The YODA Project facilitates access to participant-level clinical trial data, originally launched with Johnson & Johnson trials. Like Vivli, it provides controlled access but with academic stewardship from Yale University.

Benefits:

Transparent and independent data review committee
Peer-reviewed request process
Wide range of therapeutic areas and sponsors
Ideal for systematic reviews and re-analyses

YODA ensures ethical, scientific, and secure reuse of trial datasets for non-commercial academic purposes.

4. Dryad – An Open Access Research Repository

Dryad is a general-purpose data repository used by many medical and biological journals to host underlying datasets. It supports FAIR (Findable, Accessible, Interoperable, Reusable) principles.

Attributes:

Open access with DOI assignment
Simple CSV/Excel upload format
Supports data citation in journal publications
Useful for protocol-linked data tables

While not trial-specific, Dryad offers wide reach for published datasets supporting transparency and reproducibility.

5. EU Clinical Trials Register (EUCTR)

Managed by the EMA, the EUCTR provides public access to clinical trials conducted in the EU. It includes trial design, sponsor info, and results summaries, aligned with the EU Clinical Trials Regulation (CTR).

Core Capabilities:

Automatically populated via national competent authorities
Open access portal
Supports results posting and EudraCT ID linkage
Essential for compliance with EU CTR

While limited in accepting raw datasets, EUCTR plays a critical role in regulatory and public transparency.

Honorable Mentions and Niche Repositories

ISRCTN Registry – Offers DOI assignment and metadata enhancement
Zenodo – EU-backed repository for all disciplines, including clinical data
Figshare – Supports supplemental materials and interactive visualizations
OpenTrials.net – Curates trial information from multiple sources

Some funders and journals also maintain their own repositories — always check sponsor-specific data sharing policies.

Choosing the Right Repository: Decision Factors

When selecting a repository, consider the following:

Regulatory obligations – Some registries are legally required (e.g., ClinicalTrials.gov)
Data type – IPD vs summary data
Access model – Open vs controlled
Anonymization requirements – Privacy law compliance
Discoverability – DOI assignment, indexing, and citation metrics

Multi-platform upload is also common: registration in one platform, datasets in another, and publications linked to both.

Conclusion: Enabling Transparency Through Strategic Repository Use

Repositories are vital infrastructure for global clinical trial transparency. They empower open science, reinforce participant trust, and accelerate therapeutic innovation. By understanding each platform’s strengths, access policies, and submission standards, trial sponsors and investigators can choose the most effective way to disseminate data and meet compliance expectations. Transparency is no longer optional — and these repositories are the gateways to achieving it.