Published on 21/12/2025
Figure Standards That Stick: Making Labels, Ordering, and Color Rules Reproducible and Reviewer-Friendly
Why “figure standards” are a regulatory deliverable—not just a style preference
Figures drive first impressions and hard questions
For many reviewers, your figures are the first contact with the analysis, so they must answer “what is shown, why it matters, and how it was built” within seconds. Poorly labeled axes, inconsistent ordering of arms or endpoints, or colors that imply significance can create avoidable queries and rework. Consistent figure standards—codified and version-controlled—turn every forest plot, Kaplan–Meier curve, and exposure graph into a defensible artifact whose message survives scrutiny across US, EU, and UK review styles. The goal is speed to comprehension: a reviewer should not need to open the SAP to decode a legend.
Declare one compliance backbone and reuse it across all graphics
State, once, the controls that apply to every figure: conformance to CDISC naming and conventions; source lineage from SDTM into ADaM; machine-readable specs in Define.xml with human-readable aids (ADRG, SDRG); estimand-aligned wording per ICH E9(R1); GCP oversight per ICH E6(R3); inspection expectations influenced by FDA BIMO; electronic controls consistent with 21 CFR Part 11 and Annex mapping to Annex
Outcome targets for figure programs
Set three targets and check them at every data cut: (1) comprehension in under 10 seconds (title and subtitle answer “what and who”); (2) reproducibility on demand (open the spec, code, and source in two clicks); (3) visual integrity (no accidental significance cues; color-blind safe palettes; consistent ordering tokens). When you can demonstrate these at a stopwatch drill, you have evidence that your figure standards are working.
Regulatory mapping: US-first clarity with EU/UK portability
US (FDA) angle—event → evidence in minutes
US assessors will trace an on-screen number to the dataset, variable derivation, and programming note that produced it. Figure standards must therefore embed: population labels (e.g., ITT, PP), analysis method cues (e.g., MMRM, Cox), confidence interval definitions, and censoring rules in time-to-event graphics. Titles should name the endpoint and population; footnotes should state handling of missing data, ties, or multiplicity. Legends should define all symbols and error bars. This eliminates guesswork and reduces the odds of a “please explain your axis” query that slows the clock.
EU/UK (EMA/MHRA) angle—same truth, localized wrappers
EMA/MHRA reviewers will look for transparency and alignment with public narratives: a clear connection to registry language, avoidance of promotional tone, and accessibility of color choices for color-vision deficiency. They also probe estimand clarity: if the graphic supports a different strategy than the main estimand, a label must say so. Your US-first rules travel well if labels are literal, footnotes cite the SAP, and line styles and markers are chosen for legibility when printed in grayscale.
| Dimension | US (FDA) | EU/UK (EMA/MHRA) |
|---|---|---|
| Electronic records | Part 11 validation & attribution | Annex 11 controls and supplier qualification |
| Transparency | Consistency with ClinicalTrials.gov wording | EU-CTR status via CTIS; UK registry alignment |
| Privacy | HIPAA “minimum necessary” | GDPR/UK GDPR minimization and purpose limits |
| Figure labeling | Population/method in title; CI and censoring in notes | Estimand clarity; grayscale legibility |
| Inspection lens | Event→evidence drill-through speed | Completeness & accessibility of presentation |
Process & evidence: a figure standard that survives inspection
Title, subtitle, and footnote tokens
Create reusable tokens. Title: “Endpoint — Population — Method.” Subtitle: covariates or windows. Footnotes: censoring, handling of ties, imputation, dictionary versions, and multiplicity control with SAP reference. Tokens prevent drift and let medical writing reuse exact phrases in the CSR, keeping words and numbers synchronized.
Ordering and grouping rules
Define treatment-arm order (randomization order unless justified otherwise), endpoint order (primary → secondary → exploratory), and subgroup order (overall → prespecified → exploratory). For forest plots, group by logical themes (demographics, disease burden) and freeze positions across cuts to avoid “moving target” confusion between submissions.
- Publish a figure style guide with title/subtitle/footnote tokens and examples.
- Fix arm and endpoint ordering rules; include exceptions and required justification.
- Choose a color-blind-safe palette; lock hex codes; specify grayscale equivalents.
- Define line types and markers (KM, mean trends, CIs) and reserve patterns for status.
- Enforce unit and decimal precision rules by variable class; state rounding policy.
- Require legends to define every symbol, bar, and band; prohibit unexplained color.
- Embed provenance: figure ID, data cut, program name, and run timestamp (footer).
- Automate a “visual lint” QC (axis direction, zero baselines, CI whiskers, label overlap).
- Version-control the guide; tie changes to SAP or governance minutes.
- File style guide and examples in TMF; cross-link from CTMS study library.
Decision Matrix: labels, ordering, and color—what to choose and when
| Scenario | Option | When to choose | Proof required | Risk if wrong |
|---|---|---|---|---|
| Arms with unequal size | Randomization order (default) | Comparability outweighs visual balance | SAP excerpt; arm definitions | Implied ranking; reader confusion |
| Subgroup forest plot | Prespecified order with frozen positions | Multiple cuts or rolling submissions | Prespec list; change log if re-ordered | Misinterpretation across timepoints |
| Color constraints (accessibility) | Color-blind safe palette + grayscale viable | Mixed digital/print review | Palette spec; grayscale tests | Signals lost; accessibility findings |
| Time-to-event graphics | Solid for KM curves; dashed for CIs | Multiple strata or arms | Legend map; censoring symbol note | Ambiguous curves; misread CI |
| Non-inferiority display | Margin line with label & direction | Primary or key secondary NI endpoint | Margin value, scale, and SAP ref | Wrong side inference; query storm |
Document choices so inspectors can follow the thread
Maintain a “Figure Decision Log”: question → option → rationale → artifacts (style page, SAP clause, example figure) → owner → effective date → effectiveness (e.g., reduced figure queries). File under Sponsor Quality and cross-link from the programming standards wiki so the path from a pixel to a principle is visible.
QC / Evidence Pack: the minimum, complete set reviewers expect
- Figure style guide (versioned): titles, subtitles, footnote tokens, ordering, units.
- Color spec: hex codes, luminance contrast checks, grayscale previews, printer tests.
- Shape/line library for curves, bands, and markers; reserved patterns and meanings.
- Axis and scale policy (zero baseline rules, log scale triggers, dual-axis prohibitions).
- Rounding/precision policy with examples and CSR alignment notes.
- Automated QC scripts (“visual lint”) and sample outputs with pass/fail criteria.
- Provenance footer standard (figure ID, data cut date, program path, timestamp).
- Cross-references to SAP and Define/Reviewer Guides for traceability.
- Change control with side-by-side “before/after” for material updates.
- Drill-through map from portfolio tiles → figure family → artifact locations in TMF.
Vendor oversight & privacy (US/EU/UK)
Qualify any visualization vendors or external teams to your standards, enforce least-privilege access, and demand that generated graphics embed provenance and follow the palette/ordering rules. Where listings or subject-level figures risk exposure, apply minimization and de-identification consistent with privacy and local rules; store interface logs and incident reports next to the figure library.
Templates reviewers appreciate: paste-ready labels, footnotes, and palette tokens
Title and subtitle tokens
“Primary Endpoint — ITT — Change from Baseline in [Endpoint] at Week 24 — MMRM (Unstructured) Adjusted for [Covariates].”
“Time-to-Event — ITT — Time to [Event] — Kaplan–Meier with 95% CI; Cox Model HR (95% CI).”
“Subgroup Forest — ITT — Treatment Effect (Odds Ratio, 95% CI); Prespecified Subgroups, Frozen Order.”
Footnote library (excerpt)
F1: “Bars show mean with 95% CI; whiskers denote confidence limits.”
F2: “KM curves show time from randomization; tick marks denote censoring; CI as shaded band.”
F3: “Non-inferiority margin = [X] on [Scale]; line indicates direction where control favored.”
F4: “Multiplicity controlled via hierarchical order per SAP §[ref].”
F5: “Dictionary versions: MedDRA [ver]; WHODrug [ver], applied per SAP.”
Palette tokens and accessibility
Define 6–8 colors with hex codes and reserved meanings (e.g., Arm A, Arm B, CI bands, reference lines). Require luminance contrast ≥4.5:1 for text/lines and a grayscale proof for print. Prohibit red/green pairings without pattern differences; pair color with shape (marker type) for redundancy.
Figure families: consistent rules for the plots reviewers see most
Forest plots
Use fixed column ordering (subgroup name → N per arm → effect size with CI → p-value if applicable). Freeze subgroup order and use the same x-axis range across cuts where feasible. Show the reference line clearly and label the effect direction to avoid accidental inversions.
Kaplan–Meier curves
Use solid lines for arm curves and distinct shapes for censoring ticks; display at-risk tables aligned beneath with synchronized time grids. Explain administrative censoring and competing risks in the footnote if relevant. Avoid running legends over the plot area; place outside for clarity.
Exposure and shift plots
For exposure over time, use stacked bars with consistent category order and a footnote defining exposure thresholds. For lab shift plots, include quadrant labels, axes with clinical threshold lines, and footnotes that define baseline and worst on-treatment values to keep interpretation identical across reviewers.
Operating cadence: version, test, and release graphics so first builds converge
Dry runs and “figure days”
Hold cross-functional “figure days” where statisticians, programmers, writers, and QA review draft plots against the style guide and SAP. Read titles and footnotes aloud; confirm ordering, scales, and tokens; and approve palette compliance. Catching issues here prevents mass re-layouts at CSR time.
Automation and reproducibility
Automate header/footer provenance, apply a visual lint tool (axis direction, zero baseline, label overlap), and store seeds, environment hashes, and parameter files with the run logs. Any figure should rebuild byte-identical given the same inputs and environment—an expectation you should prove during a stopwatch drill.
Governance and change control
All material edits to tokens, colors, or ordering require a change summary and a one-page “before/after” exhibit filed with governance minutes. Communicate changes to vendors the same day and require acknowledgment. During inspection, open this packet first—it shows you run figures as a controlled system.
FAQs
How detailed should figure titles be?
Titles must name the endpoint, population, and method. Subtitles carry covariates or windowing; footnotes carry censoring, imputation, and multiplicity notes. This triad lets a reviewer place the figure in the SAP without opening another document and reduces clarification queries.
What is the safest default for arm ordering?
Randomization order is the least misleading and most defensible default. Alphabetical ordering can imply favoritism or change between submissions. If you deviate, state why in the footnote and freeze the new order for subsequent cuts to prevent confusion.
How do we make colors both accessible and printable?
Start with a color-blind-safe palette, lock hex codes, and verify luminance contrast. Produce grayscale proofs and require pattern redundancy (line type or marker shape) so meaning survives monochrome printing. Reserve saturated colors for reference lines and warnings only.
Where do figure standards live for inspectors?
In a version-controlled style guide filed in TMF alongside example figures, the decision log, and automated QC outputs. Cross-link from CTMS so monitors and inspectors can drill from a figure on a slide to the policy that governs it in two clicks.
How do we avoid implying statistical significance visually?
Use neutral palettes for arms, avoid “traffic light” colors, and never color p-values by threshold. Keep reference lines and margins labeled and subtle. State explicitly in the footnote when a line denotes a non-inferiority margin or clinically meaningful threshold to prevent misinterpretation.
Do we need separate rules for KM, forest, and exposure plots?
Yes—shared tokens plus family-specific rules. Common tokens standardize titles, subtitles, and footnotes; family rules handle axis scales, markers, and ordering. This balance keeps outputs consistent without forcing awkward compromises across very different visual grammars.
