Figure Standards That Stick: Labels, Ordering, Color Rules

Published on 21/12/2025

Figure Standards That Stick: Making Labels, Ordering, and Color Rules Reproducible and Reviewer-Friendly

Table of Contents

Why “figure standards” are a regulatory deliverable—not just a style preference

Figures drive first impressions and hard questions

For many reviewers, your figures are the first contact with the analysis, so they must answer “what is shown, why it matters, and how it was built” within seconds. Poorly labeled axes, inconsistent ordering of arms or endpoints, or colors that imply significance can create avoidable queries and rework. Consistent figure standards—codified and version-controlled—turn every forest plot, Kaplan–Meier curve, and exposure graph into a defensible artifact whose message survives scrutiny across US, EU, and UK review styles. The goal is speed to comprehension: a reviewer should not need to open the SAP to decode a legend.

Declare one compliance backbone and reuse it across all graphics

State, once, the controls that apply to every figure: conformance to CDISC naming and conventions; source lineage from SDTM into ADaM; machine-readable specs in Define.xml with human-readable aids (ADRG, SDRG); estimand-aligned wording per ICH E9(R1); GCP oversight per ICH E6(R3); inspection expectations influenced by FDA BIMO; electronic controls consistent with 21 CFR Part 11 and Annex mapping to Annex

11; public narrative alignment with ClinicalTrials.gov, EU-CTR in CTIS; privacy principles per HIPAA; every graphic generation leaves a searchable audit trail; defects route through CAPA; risk is monitored against QTLs and governed by RBM; and designs must not mislead especially in non-inferiority contexts. Anchor authority once with compact in-line links—FDA, EMA, MHRA, ICH, WHO, PMDA, and TGA—then apply the same truth across outputs.

Outcome targets for figure programs

Set three targets and check them at every data cut: (1) comprehension in under 10 seconds (title and subtitle answer “what and who”); (2) reproducibility on demand (open the spec, code, and source in two clicks); (3) visual integrity (no accidental significance cues; color-blind safe palettes; consistent ordering tokens). When you can demonstrate these at a stopwatch drill, you have evidence that your figure standards are working.

Regulatory mapping: US-first clarity with EU/UK portability

US (FDA) angle—event → evidence in minutes

US assessors will trace an on-screen number to the dataset, variable derivation, and programming note that produced it. Figure standards must therefore embed: population labels (e.g., ITT, PP), analysis method cues (e.g., MMRM, Cox), confidence interval definitions, and censoring rules in time-to-event graphics. Titles should name the endpoint and population; footnotes should state handling of missing data, ties, or multiplicity. Legends should define all symbols and error bars. This eliminates guesswork and reduces the odds of a “please explain your axis” query that slows the clock.

EU/UK (EMA/MHRA) angle—same truth, localized wrappers

EMA/MHRA reviewers will look for transparency and alignment with public narratives: a clear connection to registry language, avoidance of promotional tone, and accessibility of color choices for color-vision deficiency. They also probe estimand clarity: if the graphic supports a different strategy than the main estimand, a label must say so. Your US-first rules travel well if labels are literal, footnotes cite the SAP, and line styles and markers are chosen for legibility when printed in grayscale.

Dimension	US (FDA)	EU/UK (EMA/MHRA)
Electronic records	Part 11 validation & attribution	Annex 11 controls and supplier qualification
Transparency	Consistency with ClinicalTrials.gov wording	EU-CTR status via CTIS; UK registry alignment
Privacy	HIPAA “minimum necessary”	GDPR/UK GDPR minimization and purpose limits
Figure labeling	Population/method in title; CI and censoring in notes	Estimand clarity; grayscale legibility
Inspection lens	Event→evidence drill-through speed	Completeness & accessibility of presentation

Process & evidence: a figure standard that survives inspection

Title, subtitle, and footnote tokens

Create reusable tokens. Title: “Endpoint — Population — Method.” Subtitle: covariates or windows. Footnotes: censoring, handling of ties, imputation, dictionary versions, and multiplicity control with SAP reference. Tokens prevent drift and let medical writing reuse exact phrases in the CSR, keeping words and numbers synchronized.

Ordering and grouping rules

Define treatment-arm order (randomization order unless justified otherwise), endpoint order (primary → secondary → exploratory), and subgroup order (overall → prespecified → exploratory). For forest plots, group by logical themes (demographics, disease burden) and freeze positions across cuts to avoid “moving target” confusion between submissions.

Publish a figure style guide with title/subtitle/footnote tokens and examples.
Fix arm and endpoint ordering rules; include exceptions and required justification.
Choose a color-blind-safe palette; lock hex codes; specify grayscale equivalents.
Define line types and markers (KM, mean trends, CIs) and reserve patterns for status.
Enforce unit and decimal precision rules by variable class; state rounding policy.
Require legends to define every symbol, bar, and band; prohibit unexplained color.
Embed provenance: figure ID, data cut, program name, and run timestamp (footer).
Automate a “visual lint” QC (axis direction, zero baselines, CI whiskers, label overlap).
Version-control the guide; tie changes to SAP or governance minutes.
File style guide and examples in TMF; cross-link from CTMS study library.

Decision Matrix: labels, ordering, and color—what to choose and when

Scenario	Option	When to choose	Proof required	Risk if wrong
Arms with unequal size	Randomization order (default)	Comparability outweighs visual balance	SAP excerpt; arm definitions	Implied ranking; reader confusion
Subgroup forest plot	Prespecified order with frozen positions	Multiple cuts or rolling submissions	Prespec list; change log if re-ordered	Misinterpretation across timepoints
Color constraints (accessibility)	Color-blind safe palette + grayscale viable	Mixed digital/print review	Palette spec; grayscale tests	Signals lost; accessibility findings
Time-to-event graphics	Solid for KM curves; dashed for CIs	Multiple strata or arms	Legend map; censoring symbol note	Ambiguous curves; misread CI
Non-inferiority display	Margin line with label & direction	Primary or key secondary NI endpoint	Margin value, scale, and SAP ref	Wrong side inference; query storm

Document choices so inspectors can follow the thread

Maintain a “Figure Decision Log”: question → option → rationale → artifacts (style page, SAP clause, example figure) → owner → effective date → effectiveness (e.g., reduced figure queries). File under Sponsor Quality and cross-link from the programming standards wiki so the path from a pixel to a principle is visible.

QC / Evidence Pack: the minimum, complete set reviewers expect

Figure style guide (versioned): titles, subtitles, footnote tokens, ordering, units.
Color spec: hex codes, luminance contrast checks, grayscale previews, printer tests.
Shape/line library for curves, bands, and markers; reserved patterns and meanings.
Axis and scale policy (zero baseline rules, log scale triggers, dual-axis prohibitions).
Rounding/precision policy with examples and CSR alignment notes.
Automated QC scripts (“visual lint”) and sample outputs with pass/fail criteria.
Provenance footer standard (figure ID, data cut date, program path, timestamp).
Cross-references to SAP and Define/Reviewer Guides for traceability.
Change control with side-by-side “before/after” for material updates.
Drill-through map from portfolio tiles → figure family → artifact locations in TMF.

Vendor oversight & privacy (US/EU/UK)

Qualify any visualization vendors or external teams to your standards, enforce least-privilege access, and demand that generated graphics embed provenance and follow the palette/ordering rules. Where listings or subject-level figures risk exposure, apply minimization and de-identification consistent with privacy and local rules; store interface logs and incident reports next to the figure library.

Templates reviewers appreciate: paste-ready labels, footnotes, and palette tokens

Title and subtitle tokens

“Primary Endpoint — ITT — Change from Baseline in [Endpoint] at Week 24 — MMRM (Unstructured) Adjusted for [Covariates].”
“Time-to-Event — ITT — Time to [Event] — Kaplan–Meier with 95% CI; Cox Model HR (95% CI).”
“Subgroup Forest — ITT — Treatment Effect (Odds Ratio, 95% CI); Prespecified Subgroups, Frozen Order.”

Footnote library (excerpt)

F1: “Bars show mean with 95% CI; whiskers denote confidence limits.”
F2: “KM curves show time from randomization; tick marks denote censoring; CI as shaded band.”
F3: “Non-inferiority margin = [X] on [Scale]; line indicates direction where control favored.”
F4: “Multiplicity controlled via hierarchical order per SAP §[ref].”
F5: “Dictionary versions: MedDRA [ver]; WHODrug [ver], applied per SAP.”

Palette tokens and accessibility

Define 6–8 colors with hex codes and reserved meanings (e.g., Arm A, Arm B, CI bands, reference lines). Require luminance contrast ≥4.5:1 for text/lines and a grayscale proof for print. Prohibit red/green pairings without pattern differences; pair color with shape (marker type) for redundancy.

Figure families: consistent rules for the plots reviewers see most

Forest plots

Use fixed column ordering (subgroup name → N per arm → effect size with CI → p-value if applicable). Freeze subgroup order and use the same x-axis range across cuts where feasible. Show the reference line clearly and label the effect direction to avoid accidental inversions.

Kaplan–Meier curves

Use solid lines for arm curves and distinct shapes for censoring ticks; display at-risk tables aligned beneath with synchronized time grids. Explain administrative censoring and competing risks in the footnote if relevant. Avoid running legends over the plot area; place outside for clarity.

Exposure and shift plots

For exposure over time, use stacked bars with consistent category order and a footnote defining exposure thresholds. For lab shift plots, include quadrant labels, axes with clinical threshold lines, and footnotes that define baseline and worst on-treatment values to keep interpretation identical across reviewers.

Operating cadence: version, test, and release graphics so first builds converge

Dry runs and “figure days”

Hold cross-functional “figure days” where statisticians, programmers, writers, and QA review draft plots against the style guide and SAP. Read titles and footnotes aloud; confirm ordering, scales, and tokens; and approve palette compliance. Catching issues here prevents mass re-layouts at CSR time.

Automation and reproducibility

Automate header/footer provenance, apply a visual lint tool (axis direction, zero baseline, label overlap), and store seeds, environment hashes, and parameter files with the run logs. Any figure should rebuild byte-identical given the same inputs and environment—an expectation you should prove during a stopwatch drill.

Governance and change control

All material edits to tokens, colors, or ordering require a change summary and a one-page “before/after” exhibit filed with governance minutes. Communicate changes to vendors the same day and require acknowledgment. During inspection, open this packet first—it shows you run figures as a controlled system.

FAQs

How detailed should figure titles be?

Titles must name the endpoint, population, and method. Subtitles carry covariates or windowing; footnotes carry censoring, imputation, and multiplicity notes. This triad lets a reviewer place the figure in the SAP without opening another document and reduces clarification queries.

What is the safest default for arm ordering?

Randomization order is the least misleading and most defensible default. Alphabetical ordering can imply favoritism or change between submissions. If you deviate, state why in the footnote and freeze the new order for subsequent cuts to prevent confusion.

How do we make colors both accessible and printable?

Start with a color-blind-safe palette, lock hex codes, and verify luminance contrast. Produce grayscale proofs and require pattern redundancy (line type or marker shape) so meaning survives monochrome printing. Reserve saturated colors for reference lines and warnings only.

Where do figure standards live for inspectors?

In a version-controlled style guide filed in TMF alongside example figures, the decision log, and automated QC outputs. Cross-link from CTMS so monitors and inspectors can drill from a figure on a slide to the policy that governs it in two clicks.

How do we avoid implying statistical significance visually?

Use neutral palettes for arms, avoid “traffic light” colors, and never color p-values by threshold. Keep reference lines and margins labeled and subtle. State explicitly in the footnote when a line denotes a non-inferiority margin or clinically meaningful threshold to prevent misinterpretation.

Do we need separate rules for KM, forest, and exposure plots?

Yes—shared tokens plus family-specific rules. Common tokens standardize titles, subtitles, and footnotes; family rules handle axis scales, markers, and ordering. This balance keeps outputs consistent without forcing awkward compromises across very different visual grammars.