deltatrials

Methodology

deltatrials monitors ClinicalTrials.gov daily and records every change to every trial. This page explains how we do it.

Data Source

Our data comes exclusively from the ClinicalTrials.gov API, maintained by the U.S. National Library of Medicine. Each daily sync fetches all trials modified since the previous run.

We track a broad set of fields including overall status, enrollment count, start and completion dates, sponsor information, study phase, conditions, interventions, eligibility criteria, contact details, and facility locations.

Location display names are enriched with GeoNames data (cities1000 + countryInfo dumps), licensed under CC BY 4.0.

SCD2 Versioning

The core of our approach is SCD2 — Slowly Changing Dimension Type 2. In plain language: we never overwrite old records. Instead, we keep every version of every trial, each with the dates it was valid.

When a trial's status changes from Recruiting to Completed, we mark the old record as closed (with a valid_to date) and create a new record (with a fresh valid_from date). The result is a complete, timestamped audit trail of every change a trial has ever undergone.

This approach lets us answer questions that a current-snapshot database cannot: When did this trial stop recruiting? How many times has the enrollment count changed? What was the status two years ago?

Update Frequency

Our automated pipeline runs daily. Each run:

  1. Fetches all trials modified on ClinicalTrials.gov since the last sync
  2. Compares the incoming data against our stored records field-by-field
  3. Creates new SCD2 versions for any trial where data has changed
  4. Writes a freshness artifact so the site can display the last-updated date

Data Quality

A few important caveats about our data:

Pipeline Architecture

Our data pipeline is automated and built for reliability at scale: