TTEEnrollment class for target trial emulation

Holds the enrollment data, design specification, and workflow state. Methods modify in-place and return `invisible(self)` for `$`-chaining. R6 reference semantics mean `trial$data[, := ...]` modifies the data.table in-place without copy-on-write overhead.

The `data_level` property controls which methods are available: - `"person_week"`: Data has one row per person per time unit. Pass `ratio` to the constructor to enroll and transition to trial level. - `"trial"`: Data has been expanded to trial panels (band-level). Methods `$s2_ipw()`, `$s4_prepare_for_analysis()`, and `$s3_truncate_weights()` require this level.

Enrollment (matching + panel expansion) transitions data from "person_week" to "trial" level and is triggered by passing `ratio` to the constructor.

Methods

**Mutating (return `invisible(self)` for chaining, step-numbered for execution order):**

`$s1_impute_confounders(confounder_vars, seed)`: Step 1: Impute missing confounders
`$s2_ipw(stabilize)`: Step 2: Calculate inverse probability of treatment weights
`$s3_truncate_weights(weight_cols, lower, upper, suffix)`: Step 3: Truncate extreme weights
`$s4_prepare_for_analysis(outcome, follow_up, ...)`: Step 4: Prepare outcome data and calculate IPCW-PP in one step

**Non-mutating (return data):**

`$extract()`: Return the data.table
`$summary(pretty)`: Return summary statistics
`$weight_summary()`: Print weight distribution diagnostics
`$table1(ipw_col)`: Generate baseline characteristics table
`$rates(weight_col)`: Calculate events, person-years, and rates
`$irr(weight_col)`: Fit Poisson models and extract IRR
`$km(ipw_col, save_path, title)`: Fit Kaplan-Meier curves

**Active bindings:**

`$enrollment_stage`: Derived lifecycle stage: `"pre_enrollment"`, `"enrolled"`, or `"analysis_ready"`

Public fields

data: A data.table with trial data.
design: A TTEDesign R6 object.
data_level: Character, "person_week" or "trial".
steps_completed: Character vector of completed workflow steps.
active_outcome: Character or NULL, current outcome for IPCW-PP.
weight_cols: Character vector of weight column names.
estimand: Character or NULL. Set to "pp" or "itt" once an analysis dataset is prepared; governs which weights are valid in `$irr()`. NULL (legacy / unprepared) is treated as per-protocol.

Active bindings

enrollment_stage: Derived lifecycle stage (read-only). Returns `"pre_enrollment"` when `data_level == "person_week"`, `"analysis_ready"` when `s5_prepare_outcome` has been run, or `"enrolled"` otherwise.

Methods

Public methods

TTEEnrollment$new()
TTEEnrollment$print()
TTEEnrollment$check_version()
TTEEnrollment$s1_impute_confounders()
TTEEnrollment$s2_ipw()
TTEEnrollment$s3_truncate_weights()
TTEEnrollment$s4_prepare_for_analysis()
TTEEnrollment$extract()
TTEEnrollment$summary()
TTEEnrollment$weight_summary()
TTEEnrollment$table1()
TTEEnrollment$rates()
TTEEnrollment$irr()
TTEEnrollment$heterogeneity_test()
TTEEnrollment$effect_modification_test()
TTEEnrollment$irr_by_subgroup()
TTEEnrollment$km()
TTEEnrollment$clone()

`TTEEnrollment$new()`

Create a new TTEEnrollment object.

Usage

TTEEnrollment$new(
  data,
  design,
  data_level = NULL,
  steps_completed = character(),
  active_outcome = NULL,
  weight_cols = character(),
  ratio = NULL,
  seed = NULL,
  extra_cols = NULL,
  enrolled_ids = NULL,
  own_data = FALSE
)

Arguments

data: A data.table containing the trial data. A copy is made automatically to avoid modifying the caller's data.
design: A [TTEDesign] object specifying column mappings.
data_level: Character or NULL. If NULL (default), auto-detects based on which identifier column exists in data. "person_week" for pre-panel data (requires person_id_var), "trial" for post-panel data (requires id_var).
steps_completed: Character vector of completed workflow steps.
active_outcome: Character or NULL, the current outcome for IPCW-PP analysis.
weight_cols: Character vector of weight column names created.
ratio: Numeric or NULL. If provided, automatically enrolls participants (sampling comparison group and creating trial panels). Only valid for person_week data.
seed: Integer or NULL. Random seed for enrollment reproducibility.
extra_cols: Character vector or NULL. Extra columns to include in trial panels during enrollment.
enrolled_ids: data.table or NULL. Pre-matched enrollment IDs from the two-pass pipeline. When provided, enrollment skips the matching phase and uses these IDs directly.
own_data: Logical. If TRUE, takes ownership of the data.table without copying it. Use only when the caller will not reuse the data.

`TTEEnrollment$print()`

Print the TTEEnrollment object.

Usage

TTEEnrollment$print(...)

Arguments

...: Ignored.

`TTEEnrollment$check_version()`

Check if this object's schema version matches the current class version. Warns if the object was saved with an older schema version.

Usage

TTEEnrollment$check_version()

Returns

`invisible(TRUE)` if versions match, `invisible(FALSE)` otherwise.

`TTEEnrollment$s1_impute_confounders()`

Step 1: Impute missing confounders by sampling from observed values.

Usage

TTEEnrollment$s1_impute_confounders(confounder_vars, seed = 4L)

Arguments

confounder_vars: Character vector of confounder column names to impute.
seed: Integer seed for reproducibility (default: 4L).

`TTEEnrollment$s2_ipw()`

Step 2: Calculates inverse probability of treatment weights.

Estimates the propensity score P(A=1 | L_baseline) via logistic regression on baseline rows only, then computes stabilized (or unstabilized) IPW. This addresses **baseline** confounding for the per-protocol analysis pipeline.

Note: This does NOT estimate time-varying treatment weights for as-treated analysis (Danaei 2013, Section 4.3). As-treated analysis is not currently implemented.

Robust standard errors for within-person correlation are handled downstream by `survey::svydesign(ids = ~person_id_var)` in `$irr()` and `$km()` (Hernan 2008, Danaei 2013).

Usage

TTEEnrollment$s2_ipw(stabilize = TRUE)

Arguments

stabilize: Logical, default TRUE.

`TTEEnrollment$s3_truncate_weights()`

Step 3: Truncates extreme weights at specified quantiles.

Usage

TTEEnrollment$s3_truncate_weights(
  weight_cols = NULL,
  lower = 0.01,
  upper = 0.99,
  suffix = "_trunc"
)

Arguments

weight_cols: Character vector or NULL.
lower: Numeric, default 0.01.
upper: Numeric, default 0.99.
suffix: Character, default "_trunc".

`TTEEnrollment$s4_prepare_for_analysis()`

Step 4: Prepare the outcome/analysis dataset for one estimand. For `estimand = "pp"` (default) this calls `$s5_prepare_outcome()` then `$s6_ipcw_pp()`; for `estimand = "itt"` it calls `$s5_prepare_outcome()` in ITT mode (no censoring at treatment switching) and skips IPCW, since baseline IPW alone is the valid ITT weight. Either way, censoring-event rows are then dropped. This is the recommended way to prepare an enrollment for analysis.

After `s6_ipcw_pp()` fits the censoring model (which legitimately needs censoring-event rows to learn from), all rows with `censor_this_period = 1` are removed from `self$data`. Those rows represent person-periods at which the individual deviated from the assigned treatment; including them in a downstream outcome regression attributes their outcomes to the baseline treatment when in fact they were observed under the deviated regime, biasing the per-protocol treatment effect. Matches TrialEmulation's PP behavior on the same inputs.

Event-priority convention: when the first outcome event falls in the same band as the protocol deviation, the band counts as an event, not a censoring – the row is kept and the censoring model does not treat it as censored (since 26.7.3).

Usage

TTEEnrollment$s4_prepare_for_analysis(
  outcome,
  follow_up = NULL,
  estimand = c("pp", "itt"),
  estimate_ipcw_pp_separately_by_treatment = TRUE,
  estimate_ipcw_pp_with_gam = TRUE,
  censoring_var = NULL
)

Arguments

outcome: Character scalar. Must be one of `design$outcome_vars`.
follow_up: Optional integer. Overrides `design$follow_up_time`.
estimand: Character, `"pp"` (per-protocol, default) or `"itt"` (intention-to-treat). ITT keeps follow-up through treatment switching and uses baseline IPW only (no IPCW); analyse it with `$irr(weight_col = "ipw_trunc")`.
estimate_ipcw_pp_separately_by_treatment: Logical, default TRUE.
estimate_ipcw_pp_with_gam: Logical, default TRUE.
censoring_var: Character or NULL. Defaults to `"censor_this_period"`.

`TTEEnrollment$extract()`

Extract the data.table from the trial object.

Usage

TTEEnrollment$extract()

Returns

A data.table with the processed trial data.

`TTEEnrollment$summary()`

Summarize trial data statistics.

Usage

TTEEnrollment$summary(pretty = FALSE)

Arguments

pretty: Logical, default FALSE. If TRUE, prints formatted output.

Returns

If `pretty = FALSE`, a list with summary stats. If TRUE, prints formatted output and invisibly returns the list.

`TTEEnrollment$weight_summary()`

Print weight distribution diagnostics.

Usage

TTEEnrollment$weight_summary()

`TTEEnrollment$table1()`

Generate baseline characteristics table.

Returns a long-format `data.table` with one row per categorical level plus one row per continuous variable. See [.swereg_table1] for the layout. The result has S3 class `c("swereg_table1", "data.table", "data.frame")`.

Usage

TTEEnrollment$table1(
  ipw_col = NULL,
  arm_labels = NULL,
  include_smd = TRUE,
  show_missing = c("when_present", "always", "none")
)

Arguments

ipw_col: Character or NULL. If specified, the table is weighted by `ipw_col`.
arm_labels: Optional named character vector `c(comparator = "...", intervention = "...")` used as column headers in place of the raw treatment values.
include_smd: Logical, whether to emit an SMD column (default `TRUE`).
show_missing: One of `"when_present"` (default — emit a Missing row only for variables with any missingness), `"always"` (emit a Missing row for every variable, even when zero), or `"none"` (suppress Missing rows entirely).

Returns

A `data.table` with class `swereg_table1`.

`TTEEnrollment$rates()`

Calculate events, person-years, and rates by treatment group.

Usage

TTEEnrollment$rates(weight_col)

Arguments

weight_col: Character, required. Column name for weights.

Returns

A data.table with events, person-years, and rates.

`TTEEnrollment$irr()`

Fit weighted Poisson regression and extract incidence rate ratios.

Uses `survey::svyglm()` with `quasipoisson` family and person-level clustering (`ids = ~person_id_var`) for robust standard errors. This accounts for within-person correlation across repeated trial entries (Hernan 2008, Danaei 2013).

**IRR vs HR**: For rare events (typical in registry-based TTE studies), the incidence rate ratio from Poisson regression approximates the hazard ratio from Cox regression (Thompson 1977). The Poisson model with `splines::ns(tstop, df=3)` flexibly models the baseline event rate over follow-up time — analogous to Cox's nonparametric baseline hazard and to Danaei et al.'s "month of follow-up and its squared terms" in pooled logistic regression.

**Computational choice**: `quasipoisson` accounts for overdispersion from survey weights, and `svyglm` scales to large registry datasets (unlike `survey::svycoxph()`). This is computationally equivalent to the pooled logistic approach used by Danaei et al. (2013).

**Calendar-time adjustment**: When `trial_id` is present in the data (from band-based enrollment), it is included in the model to adjust for calendar-time variation in outcome rates across enrollment bands (Caniglia 2023, Danaei 2013). Uses natural splines for >=5 unique trial IDs, linear term for 2-4, omitted for 1.

**Estimand (marginal)**: confounding is removed by the supplied `weights`, not by adjusting for confounders in this model, so the coefficient is a *marginal* (population-average) incidence rate ratio, standardised over the covariate distribution. This contrasts with covariate-adjusted outcome regressions (e.g. `TrialEmulation`'s pooled logistic), which target a *conditional* effect. The two coincide for the (collapsible) rate ratio but differ for the (non-collapsible) odds ratio. See `vignette("tte-methods")`, "Marginal versus conditional estimands".

Usage

TTEEnrollment$irr(weight_col)

Arguments

weight_col: Character, required. Column name for weights.

Returns

A data.table with IRR estimates and confidence intervals.

`TTEEnrollment$heterogeneity_test()`

Test for heterogeneity of treatment effects across trials.

Fits a model with a `trial_id x treatment` interaction term and returns the Wald test p-value. This tests whether the treatment effect varies across enrollment bands (Hernan 2008, Danaei 2013).

Usage

TTEEnrollment$heterogeneity_test(weight_col)

Arguments

weight_col: Character, required. Column name for weights.

Returns

A list with `p_value` (Wald test), `n_trials` (unique trial IDs), and `interaction_coefs` (data.table of interaction coefficients).

`TTEEnrollment$effect_modification_test()`

Test whether the treatment effect is modified by a categorical baseline subgroup variable.

Fits one combined model with a `treatment x factor(subgroup_var)` interaction and runs a Wald test on the interaction terms. This is the correct test for "do the stratum-specific IRRs differ" – NOT comparing the per-stratum confidence intervals. For a binary subgroup the single interaction coefficient satisfies `exp(coef) = IRR(other) / IRR(ref)`, where `ref` is the first factor level.

The subgroup variable should be a confounder (in the PS / IPCW models) so the marginal weights remain valid within each stratum.

Usage

TTEEnrollment$effect_modification_test(weight_col, subgroup_var)

Arguments

weight_col: Character, required. Column name for weights.
subgroup_var: Character, required. A categorical baseline column.

Returns

A list with `p_value` (Wald test), `subgroup_var`, `n_levels`, `interaction_coefs` (data.table), and, for a binary subgroup, `ratio_of_irrs = exp(beta)` with `ratio_lower` / `ratio_upper` (NA for multi-level subgroups).

`TTEEnrollment$irr_by_subgroup()`

Stratified IRRs within each level of a baseline subgroup.

Returns one table with an `"all"` row (= `irr()`) plus one row per subgroup level, each fit on that stratum's rows via the shared estimation core. The effect-modification test p-value (and, for a binary subgroup, the ratio of stratum IRRs) is attached as an attribute. Strata with no events or only one treatment arm degrade to NA with a warning; NA-subgroup rows are dropped (count attached as an attribute).

Usage

TTEEnrollment$irr_by_subgroup(weight_col, subgroup_var)

Arguments

weight_col: Character, required. Column name for weights.
subgroup_var: Character, required. A categorical baseline column.

Returns

A data.table with columns `level, IRR, IRR_lower, IRR_upper, IRR_pvalue, warn`, with attributes `em_pvalue`, `ratio_of_irrs`, and `n_na_subgroup`.

`TTEEnrollment$km()`

Fit Kaplan-Meier curves and optionally plot. Uses IPW only (not IPCW) because IPCW is time-varying.

Usage

TTEEnrollment$km(ipw_col, save_path = NULL, title = NULL)

Arguments

ipw_col: Character, required. Column name for IPW weights.
save_path: Character or NULL. If specified, saves the plot.
title: Character or NULL. Plot title.

Returns

A svykm object (invisibly if save_path is specified).

`TTEEnrollment$clone()`

The objects of this class are cloneable with this method.

Usage

TTEEnrollment$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

if (FALSE) { # \dontrun{
design <- TTEDesign$new(
  person_id_var = "id",
  treatment_var = "intervention",
  outcome_vars = "death",
  confounder_vars = c("age", "sex"),
  follow_up_time = 52L,
  eligible_var = "eligible"
)

# Enroll via constructor (band-based), then $-chain
enrollment <- TTEEnrollment$new(my_skeleton, design,
  ratio = 2, seed = 4, extra_cols = "isoyearweek"
)
enrollment$
  s2_ipw()$
  s4_prepare_for_analysis(outcome = "death", estimate_ipcw_pp_with_gam = TRUE)
} # }

TTEEnrollment class for target trial emulation

Methods

See also

Public fields

Active bindings

Methods

Public methods

TTEEnrollment$new()

Usage

Arguments

TTEEnrollment$print()

Usage

Arguments

TTEEnrollment$check_version()

Usage

Returns

TTEEnrollment$s1_impute_confounders()

Usage

Arguments

TTEEnrollment$s2_ipw()

Usage

Arguments

TTEEnrollment$s3_truncate_weights()

Usage

Arguments

TTEEnrollment$s4_prepare_for_analysis()

Usage

Arguments

TTEEnrollment$extract()

Usage

Returns

TTEEnrollment$summary()

Usage

Arguments

Returns

TTEEnrollment$weight_summary()

Usage

TTEEnrollment$table1()

Usage

Arguments

Returns

TTEEnrollment$rates()

Usage

Arguments

Returns

TTEEnrollment$irr()

Usage

Arguments

Returns

TTEEnrollment$heterogeneity_test()

Usage

Arguments

Returns

TTEEnrollment$effect_modification_test()

Usage

Arguments

Returns

TTEEnrollment$irr_by_subgroup()

Usage

Arguments

Returns

TTEEnrollment$km()

Usage

Arguments

Returns

TTEEnrollment$clone()

Usage

Arguments

Examples

`TTEEnrollment$new()`

`TTEEnrollment$print()`

`TTEEnrollment$check_version()`

`TTEEnrollment$s1_impute_confounders()`

`TTEEnrollment$s2_ipw()`

`TTEEnrollment$s3_truncate_weights()`

`TTEEnrollment$s4_prepare_for_analysis()`

`TTEEnrollment$extract()`

`TTEEnrollment$summary()`

`TTEEnrollment$weight_summary()`

`TTEEnrollment$table1()`

`TTEEnrollment$rates()`

`TTEEnrollment$irr()`

`TTEEnrollment$heterogeneity_test()`

`TTEEnrollment$effect_modification_test()`

`TTEEnrollment$irr_by_subgroup()`

`TTEEnrollment$km()`

`TTEEnrollment$clone()`