TTE workflow: from skeleton to results • swereg

Target Trial Emulation with swereg

This vignette walks through the full target trial emulation (TTE) workflow in swereg: from skeleton files on disk to ETT-level per-protocol effect estimates. It tries to keep the epidemiological rationale and the technical R6/callr mechanics on the same page so you can read it top-down without losing the plot.

For the methodological mapping to reference papers (Hernán 2008/2016, Danaei 2013, Caniglia 2023, Cashin 2025) see vignette("tte-methodology"). For a one-page glossary of the terms used in this vignette see vignette("tte-nomenclature"). For how the skeleton files are built in the first place see vignette("skeleton-pipeline").

The TTE concept in one page

A randomized trial of an intervention gives you the causal effect for free if the randomization is valid: treatment assignment is independent of potential outcomes, so comparing observed outcomes across arms is an unbiased estimator of the effect. The problem is that most clinically important questions can’t be randomized – usually because the intervention is a drug that is already approved and widely used, or because randomizing would be unethical, or because the outcome is too rare to power a feasible trial.

Target trial emulation is a discipline for doing observational analyses as if they were trials. You write down the protocol of the target trial first (eligibility, treatment arms, assignment, outcome, follow-up, analysis plan), then use observational data to build a dataset that mimics the protocol as closely as possible. The mimic is imperfect in specific, nameable ways, and each imperfection has a specific statistical correction.

The emulation failures

Hernán and Robins (2016) name four specific failure modes that observational analyses suffer which randomized trials do not:

Treatment assignment is not random – people who choose the intervention differ systematically from people who don’t. Corrected with baseline confounder adjustment, typically via inverse probability of treatment weighting (IPW).
Time zero is not well-defined – in a trial, time zero is the moment of randomization. In observational data there is no such moment, so you have to define one. Swereg uses band-based enrollment: any eligible person-week inside the enrollment period becomes a potential time-zero, creating a sequence of overlapping trials.
Follow-up is informatively truncated – people drop out of observation (emigration, death, end of data) in ways that depend on their baseline or time-varying characteristics. Corrected with inverse probability of censoring weighting (IPCW).
Treatment is not adherent – in a trial, adherence is enforced (or non-adherence is measured and modeled). In observational data, people switch treatments over time. swereg handles this via per-protocol censoring at treatment switch, followed by IPCW-PP to correct for the selection that creates.

The sequence-of-nested-trials construction

The canonical observational TTE (Hernán et al. 2008; Danaei et al. 2013) builds one trial per eligible enrollment period. Within each period, a new-user design identifies people initiating the treatment (“intervention arm”) and people eligible but not initiating (“comparator arm”). Each person can appear in multiple sequential trials as long as they remain eligible and non-initiated.

This produces a long panel: one row per person per trial per follow-up week. It’s huge but the structure is uniform and the statistical operations (matching, IPW, pooled logistic / Poisson regression) are straightforward once you have it.

The swereg TTE model

The swereg implementation maps this concept onto three R6 classes and two parallelized loops.

TTEDesign: column name schema

TTEDesign holds the names of the columns that define the trial schema: person ID, treatment, outcome, confounders, time. It’s constructed once and reused across every enrollment. Think of it as the “schema” of a trial, not the trial itself.

TTEEnrollment: one sequence of trials

TTEEnrollment wraps the data for one group of trials that share an eligibility definition. Its data_level field tracks the lifecycle:

"person_week": raw skeleton subset after applying age-range / isoyear / exclusion filters. One row per person per ISO week. This is the input to enrollment.
"trial": after $enroll() (called internally by TTEEnrollment$new(..., ratio = )), the data has been expanded to the counting-process trial panel – one row per person per trial per time period. tstart / tstop columns appear here. This is the form IPW and IPCW-PP estimation operates on.

Enrollment itself is per-band stratified matching: within each period_width-week band, swereg samples matching_ratio comparator for every observed initiator. This is a computational shortcut compared to full cloning – see vignette("tte-methodology") for the trade-offs.

TTEPlan: the ETT grid builder

TTEPlan holds the ETT grid: the Cartesian product of enrollments × outcomes × follow-up durations. One ETT is one final analysis-ready dataset. The plan knows how to run the two loops that produce those datasets.

# spec is a YAML file; study is a RegistryStudy with built skeletons
plan <- swereg::tteplan_from_spec_and_registrystudy(
  spec  = "my_study/spec.yaml",
  study = study
)
plan$ett
#>     ett_id enrollment_id  outcome_var  follow_up_weeks  file_raw  file_imp  file_analysis
#>  1:  ETT01            01  osd_i21_to_i24             52  ...       ...       ...
#>  2:  ETT02            01  osd_i21_to_i24            156  ...       ...       ...
#>  3:  ETT03            01  osd_i21_to_i24            260  ...       ...       ...
#>  4:  ETT04            01  osd_i60_i61_i63            52  ...       ...       ...
#>  5:  ETT05            01  osd_i60_i61_i63           156  ...       ...       ...
#> ...

Every row in plan$ett points at three files: a file_raw (post-enrollment, pre-imputation), a file_imp (post-imputation + IPW, reused across outcomes within the same enrollment_id), and a file_analysis (one per ETT). Loop 1 produces the first two; Loop 2 produces the third.

The spec YAML

All of the clinical + methodological decisions live in a YAML file that gets parsed into a nested R list by tteplan_read_spec(). The top-level sections are:

The example below uses a classic TTE setup: a new-user comparison of statin initiation vs no statin initiation for primary prevention of myocardial infarction and stroke among adults aged 40-75 with no prior cardiovascular disease. This is the canonical example from Hernan et al. (2008) and Danaei et al. (2013).

study:
  title: "Statins for primary prevention of cardiovascular events"
  principal_investigator: "..."
  description: "..."
  implementation:
    project_prefix: "statins_primary_prevention"

inclusion_criteria:
  isoyears: [2010, 2023]

exclusion_criteria:
  - name: "Prior myocardial infarction (ICD-10 I21-I24)"
    rationale: "Primary prevention cohort: no prior CVD events"
    implementation:
      source_variable: osd_i21_to_i24
      window: "lifetime_before_baseline"
      computed: true

  - name: "Prior stroke (ICD-10 I60-I61, I63)"
    rationale: "Primary prevention cohort: no prior CVD events"
    implementation:
      source_variable: osd_i60_i61_i63
      window: "lifetime_before_baseline"
      computed: true

  - name: "Prior statin use (ATC C10AA)"
    rationale: "New-user design: no prior statin use"
    implementation:
      source_variable: rx_c10aa
      window: "lifetime_before_baseline"
      computed: true

confounders:
  - name: "Age (continuous)"
    implementation:
      variable: rd_age_continuous

  - name: "Sex"
    implementation:
      variable: ri_sex

  - name: "Diabetes in past year (ICD-10 E10-E14)"
    implementation:
      source_variable: osd_e10_to_e14
      window: 52
      computed: true

  - name: "Hypertension in past year (ICD-10 I10-I15)"
    implementation:
      source_variable: osd_hypertension
      window: 52
      computed: true

outcomes:
  - name: "Myocardial infarction (I21-I24)"
    implementation:
      variable: osd_i21_to_i24

  - name: "Ischemic stroke (I60, I61, I63)"
    implementation:
      variable: osd_i60_i61_i63

follow_up:
  - { label: "1 year",  weeks: 52 }
  - { label: "3 years", weeks: 156 }
  - { label: "5 years", weeks: 260 }

enrollments:
  - id: "01"
    name: "Statin initiation vs none, age 40-75"
    additional_inclusion:
      - type: age_range
        min: 40
        max: 75
        implementation:
          variable: rd_age_continuous
    treatment:
      arms:
        intervention: "Statin initiation"
        comparator:   "No statin"
      implementation:
        matching_ratio:   5
        variable:         rd_statin_status
        intervention_value: initiated
        comparator_value: not_initiated
        seed:             4

Anatomy of an exclusion criterion

Each item under exclusion_criteria has two halves:

A clinical half (name, rationale) that medical collaborators can review without knowing anything about R.
An implementation half (implementation:) that swereg consumes: the name of the skeleton column used to evaluate the rule (source_variable), the time window during which an event in that column causes exclusion (window), and a computed flag marking rules that swereg should apply automatically at enrollment time via a rolling window.

window can be:

An integer number of weeks (e.g. 104 for “in the past 104 weeks”).
The string "lifetime_before_baseline" (at any time before the candidate time zero).
The string "lifetime_before_and_after_baseline" (ever, past or future).

The distinction between “lifetime_before” and “lifetime_before_and_after” matters for prevalent vs incident outcomes: a prior myocardial infarction excludes you looking backward only (because forward is the outcome), but a condition that fundamentally confounds treatment choice regardless of its timing should exclude you looking both directions.

Computed confounders

Confounders with computed: true under implementation are rolling-window indicators that swereg builds automatically from the skeleton via tteplan_apply_derived_confounders(). For example, “diabetes in past year” is computed as “any TRUE in osd_e10_to_e14 within the last 52 weeks at the candidate time-zero”. Non-computed confounders read a column directly (e.g. rd_age_continuous is already on the skeleton).

The computed flag is the same mechanism swereg uses for computed exclusion criteria – rolling windows over existing skeleton columns, nothing fancier.

Enrollments and matching

Each item under enrollments is one sequence of trials. They share a global inclusion/exclusion spec but add their own additional_inclusion (almost always an age range) and additional_exclusion (if any). The treatment.implementation block names the skeleton column that classifies treatment (rd_statin_status in this example, a string column with values like "initiated", "not_initiated"), which value counts as the intervention arm, which value counts as the comparator, and the per-band sampling ratio.

Loop 1: enrollment + IPW

plan$s1_generate_enrollments_and_ipw() runs one iteration per enrollment_id (NOT per ETT – several ETTs with different outcomes can share the same enrollment). Each iteration spawns a callr subprocess that:

Loads the skeleton files for the batches assigned to this enrollment.
Applies the global inclusion filter (isoyears) and global exclusion criteria via tteplan_apply_exclusions().
Applies the enrollment’s own additional_inclusion / additional_exclusion.
Computes any computed: true confounder columns via tteplan_apply_derived_confounders().
Creates the TTEEnrollment with matching and calls $s1_collapse() to drop empty rows.
Multiply-imputes missing confounder values via $s2_impute_confounders().
Fits the baseline IPW model (stabilized logistic regression of treatment on confounders) via $s3_ipw().
Truncates extreme weights at 1/99 percentiles via $s4_truncate_weights().
Saves the result as file_imp ({prefix}_imp_{enrollment_id}.qs2).

Step 2 is where swereg enforces that every skeleton consumed by the plan is pipeline-consistent. Near the top of tteplan_from_spec_and_registrystudy() we call study$assert_skeletons_consistent(), so if any batch’s pipeline_hash differs from the study’s current pipeline hash, the plan construction errors loudly. This prevents the failure mode where you edit a code in ICD10_CODES, rebuild skeletons for only some batches, and silently run an analysis on a half-upgraded pipeline.

Steps 5-7 are all methods on TTEEnrollment:

enrollment <- TTEEnrollment$new(
  data   = person_week_data,
  design = my_design,
  ratio  = 2
)
enrollment$enrollment_stage   # "pre_enrollment"

# Inside TTEEnrollment$new() when `ratio` is passed:
# $enroll() is called privately -- this is where person-week
# becomes trial. `data_level` transitions "person_week" -> "trial".

enrollment$s1_collapse()      # drops empty rows
enrollment$s2_impute_confounders()
enrollment$s3_ipw()
enrollment$s4_truncate_weights()

enrollment$enrollment_stage   # "enrolled"

Loop 1 produces two files per enrollment_id: a file_raw intermediate and a file_imp final. The reason for the split is that imputation is stochastic; the file_raw intermediate lets you audit the raw matched trial panel before imputation, which is useful when diagnosing mismatches with the protocol.

Loop 2: per-ETT outcome weighting

plan$s2_generate_analysis_files_and_ipcw_pp() runs one iteration per row of plan$ett (one per outcome × follow-up combination). Each iteration:

Loads the file_imp for this row’s enrollment_id. Imputed confounders and baseline IPW weights are already on it.
Prepares the outcome: joins the outcome column, censors at first event or end of follow-up window.
Fits the IPCW-PP model (GAM/GLM of censoring on time-varying covariates) via $s5_prepare_for_analysis(). This includes per-protocol censoring at treatment switch, so the output is NOT an ITT dataset.
Combines weights: analysis_weight_pp = ipw * ipcw_pp.
Truncates and drops intermediate IPCW columns.
Saves the result as file_analysis ({prefix}_analysis_{ett_id}.qs2).

Loop 2 runs sequentially in the main process (not parallelized), because each ETT’s cost is usually dominated by I/O and the outcome + IPCW-PP models are fast. If that ever stops being true the parallelization is a straightforward extension.

The analysis file: rates, IRRs, KM curves

After Loop 2, each file_analysis is a trial-level panel with truncated combined weights ready for effect estimation. The TTEEnrollment R6 class provides three estimation methods that wrap survey::svyglm and survey::svykm with the right design specification:

enrollment <- swereg::qs2_read(x_file_analysis)

# Weighted rates (per person-year) by arm, trial_id, etc.
enrollment$rates(
  weight_col = "analysis_weight_pp_trunc",
  by = c("treatment")
)

# Weighted incidence rate ratio via quasipoisson
# (pooled logistic / IRR-as-HR approximation)
enrollment$irr(
  weight_col = "analysis_weight_pp_trunc",
  formula    = outcome ~ treatment + splines::ns(tstop, df = 3) + trial_id
)

# Weighted Kaplan-Meier with person-level clustered SEs
enrollment$km(
  weight_col = "analysis_weight_pp_trunc"
)

The IRR approximation to the hazard ratio is used because the alternative, survey::svycoxph(), is computationally prohibitive at registry scale. For rare outcomes (which dominate TTE studies) the quasi-Poisson IRR with splines::ns(tstop, df = 3) is a close approximation to a flexible-baseline Cox model (Thompson 1977).

Heterogeneity across trials

enrollment$heterogeneity_test()
#>   Wald test on trial_id x treatment interaction
#>   chisq = 3.21, df = 2, p = 0.20

Tests whether the treatment effect varies across the sequential trials. A significant result suggests calendar-time effect modification or changing selection on treatment initiation.

Effect modification by a baseline subgroup

To ask whether the treatment effect is modified by a categorical baseline covariate, use irr_by_subgroup() for the stratum-specific IRRs and effect_modification_test() for the formal interaction test:

enrollment$irr_by_subgroup(
  weight_col   = "analysis_weight_pp_trunc",
  subgroup_var = "rd_sex"
)
#>   level   IRR  IRR_lower IRR_upper
#>   all    2.9      2.1       4.0
#>   0      4.0      2.6       6.2
#>   1      2.0      1.3       3.1
#>   attr "em_pvalue"     = 0.018
#>   attr "ratio_of_irrs" = 0.50   (IRR level 1 / IRR level 0)

Whether the stratum IRRs differ is decided by the interaction term of the single combined model (the em_pvalue), not by comparing the per-stratum confidence intervals – overlapping CIs do not imply no difference, and vice versa. The subgroup variable must be a confounder so the marginal weights remain valid within each stratum.

In the spec-driven pipeline, add a top-level subgroups: block; the analysis then runs automatically per ETT for both estimands and appears in the “Effect modification” sheet of the exported workbook:

subgroups:
  - name: "Sex"
    implementation:
      variable: rd_sex      # must also be a confounder

Running the full pipeline

The typical project script has the shape:

# 1. Load the study built by the skeleton pipeline
study <- swereg::registrystudy_load(data_rawbatch_candidates)

# 2. Build the plan from the spec + study
plan <- swereg::tteplan_from_spec_and_registrystudy(
  spec  = "my_study/spec.yaml",
  study = study
)

# 3. Loop 1: enrollment + IPW (parallel across enrollment_ids)
plan$s1_generate_enrollments_and_ipw(n_workers = 4L)

# 4. Loop 2: per-ETT outcome weighting (sequential, fast)
plan$s2_generate_analysis_files_and_ipcw_pp()

# 5. Per-ETT analysis
for (i in seq_len(nrow(plan$ett))) {
  x_ett_id         <- plan$ett$ett_id[i]
  x_file_analysis  <- plan$ett$file_analysis[i]
  x_outcome        <- plan$ett$outcome_var[i]

  enrollment <- swereg::qs2_read(x_file_analysis)

  irr_result <- enrollment$irr(
    weight_col = "analysis_weight_pp_trunc",
    formula    = as.formula(sprintf("%s ~ treatment + splines::ns(tstop, df = 3) + trial_id", x_outcome))
  )
  # save irr_result to results/ ...
}

The x_ prefix on loop-extracted variables is a swereg convention (x_outcome not outcome) so that loop scalars are always visually distinct from dataset columns inside the body of the loop.

The TARGET checklist

Cashin et al. (2025) published a 21-item checklist for transparent reporting of target trial emulations. plan$print_target_checklist() generates a pre-populated version of the checklist, mapping each TARGET item to the swereg configuration that implements it (pulled from the spec YAML) and marking items that need manual narrative reporting.

plan$print_target_checklist()

This is meant to be read, edited into a manuscript supplement, and submitted alongside the paper. It doesn’t guarantee your analysis is correct, just that the reporting is complete.

Reproducibility via provenance

Every plan carries references back to its skeletons. Every skeleton carries a pipeline_hash() summarizing the three-phase pipeline that produced it. Every spec file is versioned in git. Together, these give you a reproducibility chain from final estimates back to the exact code that built the time grid, applied the phase-3 randvars, and registered the code entries.

Before running Loop 1, the plan asserts the skeletons are consistent (study$assert_skeletons_consistent()). If that passes, you’re guaranteed that every batch was built with the same framework + randvars + codes configuration. If your spec_vNNN.yaml file is committed and your pipeline snapshot TSV is committed, git log can answer “what was different between run A and run B” without guesswork.

Summary

Three R6 classes: TTEDesign (schema), TTEEnrollment (one sequence of trials), TTEPlan (the ETT grid builder).
Two loops: Loop 1 does enrollment + baseline IPW per enrollment_id in parallel callr workers; Loop 2 does per-protocol censoring + IPCW-PP per ETT sequentially.
Spec YAML captures every clinical and methodological decision in a file that medical collaborators can review and the swereg pipeline can execute.
Provenance – pipeline_hash, assert_skeletons_consistent, per-host pipeline snapshots – makes the reproducibility chain auditable.
Weights: baseline IPW for confounder adjustment (Loop 1), IPCW-PP for per-protocol censoring (Loop 2), combined as analysis_weight_pp = ipw * ipcw_pp, truncated at 1/99 percentiles.
Estimation: quasi-Poisson IRR approximation to the hazard ratio with splines::ns(tstop, df = 3) for flexible baseline and person-level clustered standard errors via survey.

For further reading:

vignette("skeleton-pipeline") – how the skeleton files consumed by a TTEPlan are built and incrementally rebuilt.
vignette("tte-methodology") – mapping to reference papers.
vignette("tte-nomenclature") – one-page glossary.
?TTEPlan, ?TTEEnrollment, ?TTEDesign – full method reference.