Run Logs & Reproducibility: Scripted Builds, Env Hashes, Params

Published on 21/12/2025

Reproducible Clinical Builds That Withstand Review: Run Logs, Environment Hashes, and Parameterized Scripts

Table of Contents

Why run logs and reproducibility are non-negotiable for US/UK/EU submissions

Define “reproducible” the way regulators measure it

Reproducibility is the ability to regenerate an analysis result—on demand, under observation—using the same inputs, the same parameterization, and the same computational stack. That standard is stricter than “we can get close.” It requires a scripted pipeline, evidence-grade run logs, portable parameter files, and an immutable fingerprint of the software environment. In inspection drills, reviewers expect you to traverse output → run log → parameters → program → lineage in seconds and prove the number rebuilds without manual steps.

One compliance backbone—state once, reuse everywhere

Declare the controls that your pipeline satisfies and paste them across plans, shells, reviewer guides, and CSR methods: operational expectations map to FDA BIMO; electronic records/signatures follow 21 CFR Part 11 and EU’s Annex 11; study oversight aligns with ICH E6(R3); analysis and estimand labeling follow ICH E9(R1); safety exchange is consistent with ICH E2B(R3); public narratives are consistent with ClinicalTrials.gov and EU status under EU-CTR via CTIS; privacy follows HIPAA. Every step leaves a searchable audit trail; systemic issues

route via CAPA; risk thresholds are managed as QTLs within RBM; artifacts are filed in TMF/eTMF. Data standards adopt CDISC conventions with lineage from SDTM to ADaM and machine-readable definitions in Define.xml narrated by ADRG/SDRG. Anchor authorities once within the text—FDA, EMA, MHRA, ICH, WHO, PMDA, TGA—and keep the remainder operational.

Outcome targets (and how to prove them)

Publish three measurable outcomes: (1) Traceability—from any number, a reviewer reaches the run log, parameter file, and dataset lineage in two clicks; (2) Reproducibility—byte-identical rebuilds for the same inputs/parameters/environment; (3) Retrievability—ten results drilled and justified in ten minutes. File stopwatch evidence quarterly so the “system” is visible as a routine behavior, not a slide.

Regulatory mapping: US-first clarity with EU/UK portability

US (FDA) angle—event → evidence in minutes

US assessors begin with an output value and ask: which script produced it, what parameters controlled windows and populations, which library versions were active, and where the proof of an identical re-run resides. They expect deterministic retrieval, explicit role attribution, and visible provenance in run logs. If your build relies on point-and-click steps, you will lose time proving negatives (“we didn’t change anything”). Scripted execution flips the default—you show what did happen, not what didn’t.

EU/UK (EMA/MHRA) angle—same truth, localized wrappers

EU/UK reviewers pull the same thread, emphasizing accessibility (plain language, non-jargon footnotes), governance (who approved parameter changes and when), and alignment with registered narratives. Keep a label translation sheet (IRB → REC/HRA), but do not fork scripts. The reproducibility engine stays identical; wrappers vary only in labels.

Dimension	US (FDA)	EU/UK (EMA/MHRA)
Electronic records	Part 11 validation; role attribution in logs	Annex 11 alignment; supplier qualification
Transparency	Coherence with ClinicalTrials.gov narratives	EU-CTR status via CTIS; UK registry alignment
Privacy	“Minimum necessary” PHI (HIPAA)	GDPR/UK GDPR minimization & residency
Re-run proof	Script + params + env hash → identical outputs	Same, plus change governance minutes
Inspection lens	Event→evidence speed; deterministic math	Completeness & portability of rationale

Process & evidence: build once, run anywhere, prove everything

Scripted builds beat checklists (every time)

Create a single orchestrator per build target (ADaM, listings, TLFs). The orchestrator: loads one parameter file; prints a header with environment fingerprint; runs unit/integration tests; generates artifacts; emits a trailer with row counts and output hashes; and fails fast if preconditions are unmet. Output files get provenance footers carrying the run timestamp, manifest hash, and parameter filename to enable one-click drill-through from the CSR exhibit back to the execution context.

Environment hashing prevents “works on my machine”

Lock the computational stack with a manifest (interpreter/compiler versions, package names/versions, OS details) and compute a short hash. Print the manifest and the hash at the top of the run log and in output footers. When a container or image changes, the hash changes—making environment drift visible. If numbers move, you can quickly attribute the change to a manifest delta rather than chasing spectral bugs in code.

Parameter files externalize human memory

Analysis sets, visit windows, reference dates, censoring rules, dictionary versions, seeds—every human-tunable decision—belong in a version-controlled parameter file, not hard-coded in macros. The orchestrator echoes parameter values verbatim into the run log and output footers, and the change record links each parameter edit to governance minutes. This makes the “why” and “who” auditable without asking around.

Create an orchestrator script per build target with start/end banners that include study ID and cut date.
Fingerprint the environment; print manifest + hash into run logs and output footers.
Load a single parameter file; echo all values; forbid shadow parameters.
Seed every stochastic process; print PRNG details and seed values.
Fail fast on missing/illegal parameters and outdated manifests.
Run unit/integration tests before building; abort on failures with explicit messages.
Emit row counts, summary stats, and file integrity hashes for all outputs.
Archive run logs, parameters, and manifests together for two-click retrieval.
Tag releases semantically (MAJOR.MINOR.PATCH) with human-readable change notes.
File artifacts to TMF and cross-reference from CTMS portfolio tiles.

Decision Matrix: choose the right path for reruns, upgrades, and late-breaking changes

Scenario	Option	When to choose	Proof required	Risk if wrong
Minor window tweak (±1 day)	Parameter-only rerun	Analysis logic unchanged; governance approved	Run logs with new params; identical code/env hash	Undetected code edits masquerading as param change
Security patch to libraries	Environment refresh + validation rerun	Manifest changed; code/params stable	Before/after output hashes; validation report	Unexplained numerical drift → audit finding
Algorithm clarification (baseline hunt)	Code change + targeted tests	Spec amended; impact scoped	New/updated unit tests; diff exhibit	Wider rework if not declared and tested
Late database cut	Full rebuild	Inputs changed materially	Fresh manifest/params; new output hashes	Partial rebuild creates mismatched exhibits
Macro upgrade across portfolio	Branch, compare, staged rollout	Cross-study impact likely	Golden study comparison; rollout minutes	Inconsistent behavior across submissions

Document decisions where inspectors will actually look

Maintain a “Reproducibility Decision Log”: scenario → chosen path → rationale → artifacts (run log IDs, parameter files, diff reports) → owner → effective date → measured effect (e.g., outputs impacted, time-to-rerun). File it in Sponsor Quality and cross-link from specs and program headers so the path from a number to the change is obvious.

QC / Evidence Pack: minimum, complete, inspection-ready

Orchestrator scripts and wrappers with headers describing scope and dependencies.
Environment manifest and the computed hash printed in run logs and output footers.
Version-controlled parameter files (sets, windows, dates, seeds, dictionaries).
Run logs with start/end banners, parameter echoes, seeds, row counts, and output hashes.
Unit and integration test reports; coverage by business rule, not just code lines.
Change summaries for scripts/manifests/parameters with governance references.
Before/after exhibits when numeric drift occurs (with agreed tolerances).
Dataset/output provenance footers echoing manifest hash and parameter filename.
Stopwatch drill artifacts (timestamps, screenshots) for retrieval drills.
TMF filing map with two-click retrieval from CTMS portfolio tiles.

Vendor oversight & privacy (US/EU/UK)

Qualify external programmers against your scripting/logging standards; enforce least-privilege access; keep interface logs and incident reports with build artifacts. For EU/UK subject-level debugging, document minimization, residency, and transfer safeguards; retain sample redactions and privacy review minutes with the evidence pack.

Templates reviewers appreciate: paste-ready headers, footers, and parameter tokens

Run log header (copy/paste)

Run log footer (copy/paste)

Parameter file tokens (copy/paste)

analysis_set: ITT
baseline_window: [-7,0]
visit_window: ±3d
censoring_rule: admin_lock
dictionary_versions: meddra:26.1, whodrug:B3-Apr-2025
seeds: tlf:314159, bootstrap:271828
reference_dates: fpfv:2024-03-01, lpfv:2025-06-15, dbl:2025-10-20

Operating cadence: version discipline, CI, and drills that keep you ahead of audits

Semantic versions with human-readable change notes

Apply semantic versioning to scripts, manifests, and parameter files. Every bump requires a short change narrative (what changed, why with governance reference, how to retest). A one-line version bump is invisible debt; a brief narrative prevents archaeology during inspection and speeds “why did this move?” conversations.

Continuous integration for statistical builds

Trigger CI on parameter or code changes, run tests, build in an isolated workspace, compute hashes, and publish a signed bundle (artifacts + run log + manifest + parameters). Promote bundles from dev → QA → release using the same scripts and parameters so you test the exact path you will use for submission.

Stopwatch and recovery drills

Quarterly, run three drills: Trace—pick five results and open scripts, parameters, and manifest in under five minutes; Rebuild—rerun a prior cut and compare output hashes; Recover—simulate a corrupted environment and rebuild from the manifest. File timestamps and lessons; convert slow steps into CAPA with effectiveness checks.

Common pitfalls & quick fixes: stop reproducibility leaks before they become findings

Pitfall 1: hidden assumptions in code

Fix: move every human-tunable decision to parameters; lint for undocumented constants; add failing tests when hard-coded values are detected. Echo parameters into logs and footers so reviewers never guess what was in effect.

Pitfall 2: silent environment drift

Fix: forbid ad hoc updates; require manifest changes via pull requests; compute and display environment hashes on every run. When output hashes shift, you now examine the manifest first, not the entire universe.

Pitfall 3: button-driven builds

Fix: replace GUIs with scripts; retain GUIs only as thin launchers that call the same scripts. If a person can click differently, they will—scripted execution ensures consistent steps and inspectable logs.

FAQs

What must every run log include to satisfy reviewers?

Start/end banners; study ID and cut date; user/host; environment manifest and hash; echoed parameters; seed values; unit test results; row counts and summary stats; output filenames with integrity hashes; and the filing path. With those, reviewers can reconstruct the build without summoning engineering.

How do environment hashes help during inspection?

They fingerprint the computational stack. If numbers differ and the hash changed, examine package changes; if the hash is identical, focus on inputs or parameters. Hashes shrink the search space from “everything” to a small, auditable set of suspects.

What’s the best practice for seeds in randomization/bootstrap?

Store seeds in the parameter file; print them into the run log and output footers; use deterministic PRNGs and record algorithm/version. If sensitivities require multiple seeds, iterate through a controlled list and store each run as a distinct bundle with its own hashes.

Do we need different run log formats for US vs EU/UK?

No. Keep one truth. Add a short label translation sheet (e.g., IRB → REC/HRA) to reviewer guides if needed, but maintain identical log structures, parameter files, and manifests across regions to avoid drift.

How do we prove a number changed only due to a parameter tweak?

Show two run logs with identical environment hashes and code versions but different parameter files; display the parameter diff and before/after output hashes; add a governance reference. That chain usually closes the query.

Where should run logs and manifests live?

Next to outputs in a predictable structure, cross-linked from CTMS portfolio tiles and filed to TMF. Store the parameter file and manifest with each log so retrieval is two clicks from the CSR figure/table to the run bundle.