← Leadership Papers

Real-World Evidence Depends on Data Construction, Not Just Analysis

A structural perspective on reproducibility and dataset design

AI & Regulated Data November 2025

Across healthcare and research environments, real-world evidence is increasingly used to inform decision-making. However, generating reliable and reproducible evidence remains a challenge. The core issue is not analytical capability, but inconsistency in how datasets are constructed. Addressing this requires controlled, consistent, and traceable data construction processes. This has direct implications for regulators, researchers, and healthcare systems.

The Challenge / Context

The challenge is often described as an analytical limitation.

In practice, the issue emerges from inconsistencies in data construction.

This results in:

  • inconsistent cohort definitions
  • variation in transformation logic
  • lack of reproducibility across studies

System-Level Diagnosis

These challenges reflect misalignment between:

  • data construction
  • analytical processing

When data construction is inconsistent, analytical outputs cannot be reliably compared or reproduced.

Framework

The Data Construction Pipeline

Raw Data
Operational datasets

Data Construction
Structuring and transformation

Analysis
Statistical modelling

Failure in the construction layer compromises the reliability of the entire pipeline.

Real-World Application

These patterns are observed across:

Regulatory environments
Difficulty validating consistency of evidence

Research studies
Variation in cohort definitions across datasets

Healthcare systems
Limited reliability of analytical outputs

Infrastructure Implications

Addressing this requires infrastructure that supports:

  • consistent data models such as OMOP
  • traceable transformation pipelines
  • version-controlled workflows
  • governed analytical environments

Actionable Recommendations

Organisations should prioritise:

  • standardising data construction processes
  • implementing traceable transformation pipelines
  • ensuring reproducibility of analytical workflows
  • aligning governance with analytical environments

Perspective

The constraint is not analytical capability.

It is inconsistency in how datasets are constructed.

Closing

The future of real-world evidence will not be defined by analytical tools.

It will be shaped by the systems that construct consistent and traceable datasets.

Interested in collaborating?

If this perspective resonates and you are exploring collaboration across research, governance, or secure data environments, I welcome the conversation.

Leadership Papers

More papers

View all