TTEEnrollment class for target trial emulation
TTEEnrollment class for target trial emulation
Details
Holds the enrollment data, design specification, and workflow state. Methods modify in-place and return `invisible(self)` for `$`-chaining. R6 reference semantics mean `trial$data[, := ...]` modifies the data.table in-place without copy-on-write overhead.
The `data_level` property controls which methods are available: - `"person_week"`: Data has one row per person per time unit. Pass `ratio` to the constructor to enroll and transition to trial level. - `"trial"`: Data has been expanded to trial panels (band-level). Methods `$s2_ipw()`, `$s4_prepare_for_analysis()`, and `$s3_truncate_weights()` require this level.
Enrollment (matching + panel expansion) transitions data from "person_week" to "trial" level and is triggered by passing `ratio` to the constructor.
Methods
**Mutating (return `invisible(self)` for chaining, step-numbered for execution order):**
- `$s1_impute_confounders(confounder_vars, seed)`
Step 1: Impute missing confounders
- `$s2_ipw(stabilize)`
Step 2: Calculate inverse probability of treatment weights
- `$s3_truncate_weights(weight_cols, lower, upper, suffix)`
Step 3: Truncate extreme weights
- `$s4_prepare_for_analysis(outcome, follow_up, ...)`
Step 4: Prepare outcome data and calculate IPCW-PP in one step
**Non-mutating (return data):**
- `$extract()`
Return the data.table
- `$summary(pretty)`
Return summary statistics
- `$weight_summary()`
Print weight distribution diagnostics
- `$table1(ipw_col)`
Generate baseline characteristics table
- `$rates(weight_col)`
Calculate events, person-years, and rates
- `$irr(weight_col)`
Fit Poisson models and extract IRR
- `$km(ipw_col, save_path, title)`
Fit Kaplan-Meier curves
**Active bindings:**
- `$enrollment_stage`
Derived lifecycle stage: `"pre_enrollment"`, `"enrolled"`, or `"analysis_ready"`
Public fields
dataA data.table with trial data.
designA TTEDesign R6 object.
data_levelCharacter, "person_week" or "trial".
steps_completedCharacter vector of completed workflow steps.
active_outcomeCharacter or NULL, current outcome for IPCW-PP.
weight_colsCharacter vector of weight column names.
Active bindings
enrollment_stageDerived lifecycle stage (read-only). Returns `"pre_enrollment"` when `data_level == "person_week"`, `"analysis_ready"` when `s5_prepare_outcome` has been run, or `"enrolled"` otherwise.
Methods
Method new()
Create a new TTEEnrollment object.
Usage
TTEEnrollment$new(
data,
design,
data_level = NULL,
steps_completed = character(),
active_outcome = NULL,
weight_cols = character(),
ratio = NULL,
seed = NULL,
extra_cols = NULL,
enrolled_ids = NULL,
own_data = FALSE
)Arguments
dataA data.table containing the trial data. A copy is made automatically to avoid modifying the caller's data.
designA [TTEDesign] object specifying column mappings.
data_levelCharacter or NULL. If NULL (default), auto-detects based on which identifier column exists in data. "person_week" for pre-panel data (requires person_id_var), "trial" for post-panel data (requires id_var).
steps_completedCharacter vector of completed workflow steps.
active_outcomeCharacter or NULL, the current outcome for IPCW-PP analysis.
weight_colsCharacter vector of weight column names created.
ratioNumeric or NULL. If provided, automatically enrolls participants (sampling comparison group and creating trial panels). Only valid for person_week data.
seedInteger or NULL. Random seed for enrollment reproducibility.
extra_colsCharacter vector or NULL. Extra columns to include in trial panels during enrollment.
enrolled_idsdata.table or NULL. Pre-matched enrollment IDs from the two-pass pipeline. When provided, enrollment skips the matching phase and uses these IDs directly.
own_dataLogical. If TRUE, takes ownership of the data.table without copying it. Use only when the caller will not reuse the data.
Method check_version()
Check if this object's schema version matches the current class version. Warns if the object was saved with an older schema version.
Method s2_ipw()
Step 2: Calculates inverse probability of treatment weights.
Estimates the propensity score P(A=1 | L_baseline) via logistic regression on baseline rows only, then computes stabilized (or unstabilized) IPW. This addresses **baseline** confounding for the per-protocol analysis pipeline.
Note: This does NOT estimate time-varying treatment weights for as-treated analysis (Danaei 2013, Section 4.3). As-treated analysis is not currently implemented.
Robust standard errors for within-person correlation are handled downstream by `survey::svydesign(ids = ~person_id_var)` in `$irr()` and `$km()` (Hernan 2008, Danaei 2013).
Method s3_truncate_weights()
Step 3: Truncates extreme weights at specified quantiles.
Method s4_prepare_for_analysis()
Step 4: Prepare outcome data and calculate IPCW-PP in one step. Calls `$s5_prepare_outcome()` followed by `$s6_ipcw_pp()`. This is the recommended way to prepare an enrollment for analysis.
Usage
TTEEnrollment$s4_prepare_for_analysis(
outcome,
follow_up = NULL,
estimate_ipcw_pp_separately_by_exposure = TRUE,
estimate_ipcw_pp_with_gam = TRUE,
censoring_var = NULL
)Arguments
outcomeCharacter scalar. Must be one of `design$outcome_vars`.
follow_upOptional integer. Overrides `design$follow_up_time`.
estimate_ipcw_pp_separately_by_exposureLogical, default TRUE.
estimate_ipcw_pp_with_gamLogical, default TRUE.
censoring_varCharacter or NULL. Defaults to `"censor_this_period"`.
Method summary()
Summarize trial data statistics.
Method table1()
Generate baseline characteristics table. Wraps [tableone::CreateTableOne()] or [tableone::svyCreateTableOne()].
Method irr()
Fit weighted Poisson regression and extract incidence rate ratios.
Uses `survey::svyglm()` with `quasipoisson` family and person-level clustering (`ids = ~person_id_var`) for robust standard errors. This accounts for within-person correlation across repeated trial entries (Hernan 2008, Danaei 2013).
**IRR vs HR**: For rare events (typical in registry-based TTE studies), the incidence rate ratio from Poisson regression approximates the hazard ratio from Cox regression (Thompson 1977). The Poisson model with `splines::ns(tstop, df=3)` flexibly models the baseline event rate over follow-up time — analogous to Cox's nonparametric baseline hazard and to Danaei et al.'s "month of follow-up and its squared terms" in pooled logistic regression.
**Computational choice**: `quasipoisson` accounts for overdispersion from survey weights, and `svyglm` scales to large registry datasets (unlike `survey::svycoxph()`). This is computationally equivalent to the pooled logistic approach used by Danaei et al. (2013).
**Calendar-time adjustment**: When `trial_id` is present in the data (from band-based enrollment), it is included in the model to adjust for calendar-time variation in outcome rates across enrollment bands (Caniglia 2023, Danaei 2013). Uses natural splines for >=5 unique trial IDs, linear term for 2-4, omitted for 1.
Method heterogeneity_test()
Test for heterogeneity of treatment effects across trials.
Fits a model with a `trial_id x exposure` interaction term and returns the Wald test p-value. This tests whether the treatment effect varies across enrollment bands (Hernan 2008, Danaei 2013).
Method km()
Fit Kaplan-Meier curves and optionally plot. Uses IPW only (not IPCW) because IPCW is time-varying.
Examples
if (FALSE) { # \dontrun{
design <- TTEDesign$new(
person_id_var = "id",
exposure_var = "exposed",
outcome_vars = "death",
confounder_vars = c("age", "sex"),
follow_up_time = 52L,
eligible_var = "eligible"
)
# Enroll via constructor (band-based), then $-chain
enrollment <- TTEEnrollment$new(my_skeleton, design,
ratio = 2, seed = 4, extra_cols = "isoyearweek"
)
enrollment$
s2_ipw()$
s4_prepare_for_analysis(outcome = "death", estimate_ipcw_pp_with_gam = TRUE)
} # }
