Published on 23/12/2025
Analyzing Wearable Data in Clinical Trials: A Statistical Toolkit
Introduction: The Complexity of Wearable Data Streams
Wearables produce rich and continuous streams of physiological data—from heart rate and sleep to movement and electrodermal activity. While this opens new frontiers for endpoint development and real-time monitoring, it also presents significant statistical challenges: noise, missing data, inter-patient variability, and complex time structures.
This article outlines essential statistical techniques used in the analysis of wearable data in regulated clinical trials. These techniques are critical for CROs, sponsors, and data scientists seeking to transform raw digital signals into actionable insights, especially when submitting exploratory endpoints to health authorities.
Data Cleaning: Dealing with Noise and Artifact Removal
Wearable signals often contain artifacts caused by skin contact loss, motion, or environmental interference. Key preprocessing steps include:
- Signal Smoothing: Moving average, low-pass filters, or Savitzky–Golay filters are used to reduce jitter
- Outlier Detection: Z-score or interquartile range (IQR) methods flag physiologically implausible readings
- Artifact
For example, in a sleep trial using wrist PPG, data during periods of high arm motion may be excluded due to signal contamination.
Handling Missing Data
Missingness in wearable data is common—due to device removal, battery loss, or sync failures. Imputation must consider temporal dependencies:
- Last Observation Carried Forward (LOCF): Simple but may bias results in dynamic outcomes
- Kalman Filtering: Predicts missing values in time-series using probabilistic modeling
- Multiple Imputation: Accounts for uncertainty in imputed values, especially in exploratory endpoints
It is crucial to define thresholds for acceptable missing data per patient—e.g., ≥70% daily completeness for inclusion in primary analysis.
Normalization and Baseline Anchoring
Due to high inter-subject variability in wearable data (e.g., resting HR or skin temp), normalization is essential:
- Z-Scoring: Transforms data into standardized units for between-subject comparisons
- Baseline Anchoring: Normalize changes relative to a run-in period average
- Percent Change: Useful when raw units are not normally distributed
These strategies help isolate treatment effects and improve statistical power.
Example Dataset: Wearable Activity Summary
| Subject ID | Day | Step Count | HRV (ms) | Sleep Hours |
|---|---|---|---|---|
| 101 | Baseline | 5600 | 56 | 6.8 |
| 101 | Day 7 | 6300 | 62 | 7.2 |
| 101 | Day 14 | 5900 | 61 | 7.0 |
Here, step count trends can be analyzed using mixed-effect models accounting for repeated measures over time.
Time-Series Modeling and Longitudinal Analysis
Wearable outputs like heart rate or activity are inherently time-structured. Common techniques include:
- ARIMA Models: Used for modeling autocorrelation and forecasting trends in HRV, respiration, etc.
- Functional Data Analysis (FDA): Treats time-series as continuous functions to compare shapes
- Generalized Estimating Equations (GEE): Robust to missing data and useful for longitudinal comparisons
- Mixed-Effects Models: Incorporates subject-level random effects for repeated observations
For instance, to model fatigue as a function of step count changes, a linear mixed-effects model with time as a fixed effect and subject ID as a random effect may be used.
Validation of Derived Endpoints
Before using a wearable-derived metric as a trial endpoint, it must be statistically validated:
- Construct Validity: Does the metric correlate with known clinical outcomes?
- Discriminant Validity: Can it distinguish between groups (e.g., treatment vs placebo)?
- Responsiveness: Does the metric change meaningfully over time or in response to intervention?
Statistical assessment includes ROC curve analysis, effect size calculation (e.g., Cohen’s d), and bootstrapping for confidence intervals.
Machine Learning and Feature Engineering
ML models offer powerful methods to analyze large wearable datasets, including:
- Random Forests: For classification of disease states based on multivariate sensor inputs
- Clustering: To detect symptom-based patient clusters (e.g., flare vs non-flare patterns)
- Principal Component Analysis (PCA): Reduces dimensionality and extracts latent features
Always partition datasets into training and validation sets, and avoid overfitting by applying k-fold cross-validation.
See also this guide on validating AI-based tools for regulatory trials.
Case Study: HRV in a Stress Reduction Trial
A sponsor used HRV (root mean square of successive differences – RMSSD) as a digital biomarker in a Phase II stress trial. Statistical approaches included:
- Z-score normalization against a 7-day baseline
- Outlier rejection using IQR filters
- Mixed-model ANOVA with time × treatment interaction
Statistically significant improvements in HRV were observed in the active group (p = 0.032), supporting further development.
Regulatory Expectations and Reporting
Agencies like the EMA and FDA expect transparency and rigor in digital biomarker analysis:
- Clearly define algorithms and transformations applied to raw signals
- Disclose handling of missing data and imputation techniques
- Include sensitivity analyses in your Statistical Analysis Plan (SAP)
For novel endpoints, submit qualification packages or request pre-IND/pre-CTA advice to align statistical strategies early.
Conclusion: Turning Wearable Data into Evidence
The promise of wearables lies not just in data capture, but in its robust analysis. With appropriate statistical frameworks—ranging from smoothing and imputation to machine learning and longitudinal modeling—wearable data can yield validated, regulatory-acceptable clinical evidence.
As wearable endpoints expand from exploratory to primary outcomes, statistical literacy in digital signal analysis will become a core competency for modern clinical trial teams.
