Skip to contents

swereg 26.4.2

Bug Fixes

  • Fix deadlock in callr_pool() when worker results exceed the Unix socket buffer (208KB default). Workers in .s1a_worker and .s1b_worker now write results to tempfiles instead of returning them through the socket. The main process reads and cleans up the tempfiles. This prevents the worker from blocking on send() while the main process waits on the poll connection.

Internal

  • Rename .s3_worker() back to .s2_worker() to match the $s2_generate_analysis_files_and_ipcw_pp() method it serves.

swereg 26.3.30

Improvements

  • callr_pool() gains a timeout_minutes parameter (default: 30). If a work item runs longer than the timeout, its worker is killed and the item is retried once. If the retry also times out, callr_pool() calls stop(). Disable with timeout_minutes = NULL.

CRAN compliance

  • Move mgcv from Imports to Suggests (only used conditionally via requireNamespace()).
  • Add @importFrom for progressr and utils::getFromNamespace to satisfy NAMESPACE checks.
  • Replace swereg::: calls with getFromNamespace() in callr worker sessions.
  • Replace assign(..., globalenv()) with a package-level environment (.swereg_env).
  • Add var <- NULL declarations for all data.table NSE variables.
  • Add .vscode to .Rbuildignore.

swereg 26.3.23

Improvements

  • callr_pool() workers now self-terminate if the parent R session dies (e.g. OOM kill). Each worker spawns a lightweight shell watchdog that polls the parent PID every 5 seconds. Previously, orphaned workers ran indefinitely until manually cleaned up via callr_kill_workers().

Bug Fixes

  • Critical: .s1_eligible_tuples() used first(rd_exposed) to classify exposure at each trial period, which only detected MHT initiation if it happened on the first week of a 4-week trial period. With period_width = 4, ~75% of exposed people start MHT mid-period and were silently dropped — their first trial period showed them as unexposed (week 1 was pre-initiation), and the next period excluded them for prior MHT. Fixed by using any(rd_exposed, na.rm = TRUE) instead. The existing no_prior_exposure exclusion correctly handles the new-user restriction. Verified: eligible exposed count on skeleton_001 went from 19 → 84, matching the old per-week pipeline.

  • .s1_compute_attrition(): exposure classification now uses any() per person-trial instead of checking the first eligible row. Aligns attrition reporting with the any() fix in .s1_eligible_tuples() — previously the attrition flow underreported exposed counts by ~4x.

  • tteplan_validate_spec(): missing variables (confounders, outcomes, exclusion criteria, exposure) now stop() instead of warning(). Previously, a misspelled or renamed variable would silently pass validation and break downstream (e.g. IPW model missing a confounder). Category mismatches (values in spec but not data) remain as warnings since they can occur in small batches.

swereg 26.3.20

Bug Fixes

  • .s1_compute_attrition(): fix undercounting of person-trials for row-level eligibility criteria (e.g. eligible_valid_exposure). The old code checked only the first row per person-trial, missing cases where exposure onset occurred after the first week. The new approach filters to eligible rows first, then counts — matching the logic used by .s1_eligible_tuples().

  • .s1_compute_attrition(): fix negative exposed/comparator deltas in participant flow. The before_exclusions baseline now classifies exposure from the first row with non-NA exposure per person-trial, rather than the first overall row (which often has rd_exposed = NA). Total person-trial counts remain unfiltered.

Performance

  • TTE s1 pipeline: add data.table::setkey() calls to eliminate redundant hash-based grouping. Skeleton reads in .s1_prepare_skeleton() and .s1b_worker() now set key on (id, isoyearweek) (metadata-only, no re-sort). enroll() Phase B collapse uses keyed grouping on (pid, trial_id), and Phase D panel expansion uses keyed binary join instead of merge().

Bug Fixes

  • callr_pool() PID files now written to /tmp instead of tempdir() so that orphaned workers from crashed R sessions can be discovered and cleaned up by new sessions.

  • callr_kill_workers() simplified to orphan-only cleanup: kills workers whose parent R process is dead and removes stale PID files. Own-session cleanup is already handled by callr_pool()’s on.exit() handler; this function is only needed after hard crashes (SIGKILL, OOM).

Performance

  • callr_pool() now uses persistent callr::r_session workers instead of spawning a fresh callr::r_bg() process per work item. The swereg namespace is loaded once per worker slot rather than once per item, eliminating redundant startup overhead when scaling to large numbers of items.

  • Orphan protection: callr_pool() writes a PID file per invocation and cleans up orphaned worker sessions from previous crashed runs (e.g. OOM kills) on the next invocation.

Bug Fixes

  • Fixed 3 test failures in test-tte_spec.R caused by s1 pipeline changes: added missing rd_exposed column to .s1_compute_attrition test fixtures, added n_exposed/n_unexposed to mock attrition data, and updated matching output expectations.

Performance

  • s1_generate_enrollments_and_ipw() now caches prepared skeletons between s1a (scout) and s1b (enrollment) passes, eliminating redundant file reads and exclusion processing. Expected ~30-40% reduction in per-enrollment wall-clock time.

  • .s1b_worker() now subsets the skeleton to enrolled persons before computing derived confounders, avoiding expensive rolling-window operations on non-enrolled persons.

  • TTEEnrollment$new() accepts own_data = TRUE to skip the defensive data.table::copy() when the caller will not reuse the data. Used in .s1b_worker() where the skeleton is discarded immediately after.

  • enroll() Phase B now aggregates confounders, time-exposure, and outcome columns in a single groupby pass instead of four separate passes with merges.

Improvements

  • “Valid exposure” (eligible_valid_exposure) is now the first exclusion criterion in the TTE attrition flow. Rows where rd_exposed is NA are explicitly accounted for rather than silently disappearing between the before-exclusions total and the first real criterion.

  • TARGET Item 8 (participant flow) now shows a richer flow diagram with before-exclusion counts, per-step exposed/unexposed breakdown, delta (excluded) and remaining counts at each criterion, right-justified aligned columns, and color-coded output (red for exclusions, cyan for remaining). Post-matching line also reformatted with arrow indicator. “Before exclusions” line no longer shows a meaningless exposed/comparator breakdown.

  • enrollment_counts$attrition now includes n_exposed and n_unexposed columns and a "before_exclusions" row.

Bug Fixes

  • Fixed trial_id missing error caused by attr<- breaking data.table’s internal self-reference. Replaced with data.table::setattr() in .s1_prepare_skeleton() and tteplan_apply_exclusions() to preserve in-place modification semantics.

  • Fixed callr worker stale-namespace bug: after devtools::load_all() in a subprocess, worker functions still referenced the old (installed) swereg namespace. Now rebinds the worker function’s environment to the freshly-loaded namespace.

Improvements

  • Reorganized print_spec_summary() header layout: renamed “Study created” → “RegistryStudy”, merged “Skeletons created” + “Skeleton files” into a single nested line with tree connector, renamed “Plan created” → “TTEPlan”, and reordered to follow data pipeline order.

  • Rewrote TARGET checklist items 6c, 6h, and 7a-h in print_target_checklist() as academic prose suitable for copy-pasting into a methods section. Item 6c now dynamically reflects per-enrollment matching ratios from the spec.

Breaking changes

  • enrollment_counts structure changed: Each element of TTEPlan$enrollment_counts is now a list with $attrition and $matching sub-elements (was a single data.table). Code accessing plan$enrollment_counts[["01"]] directly as a data.table must update to plan$enrollment_counts[["01"]]$matching.

  • person_trial_id renamed to enrollment_person_trial_id: The composite key column now has a 3-part name matching its 3-part format (enrollment_id.person_id.trial_id). All code referencing person_trial_id must be updated.

  • process_fn parameter removed from $s1_generate_enrollments_and_ipw(): The two-pass spec-driven pipeline is now the only code path. self$spec is required (create plans with tteplan_from_spec_and_registrystudy()). The legacy single-pass .s1_worker() has been deleted.

  • .s2_worker() renamed to .s3_worker(): Internal Loop 2 IPCW-PP worker renamed to avoid confusion with the two-pass Loop 1 pipeline.

New features

  • Two-pass enrollment pipeline: $s1_generate_enrollments_and_ipw() now uses a two-pass pipeline that fixes cross-batch matching ratio imbalance:

    1. Pass 1a (scout): Lightweight parallel pass collecting eligible (person_id, trial_id, exposed) tuples from all batches.
    2. Centralized matching: Combines all tuples and performs per-trial_id matching globally, ensuring the correct ratio across all batches.
    3. Pass 1b (full enrollment): Parallel pass using pre-matched IDs to enroll without per-batch matching.
  • enrollment_counts on TTEPlan: New field storing per-trial matching counts (total vs enrolled, exposed vs unexposed) for TARGET Item 8 reporting.

  • .assign_trial_ids(): New shared helper function that is the single source of truth for isoyearweek -> trial_id mapping. Used consistently by both scout (s1a) and enrollment (s1b/enroll) phases.

  • enrolled_ids parameter on TTEEnrollment$new(): When provided, enrollment skips the matching phase and uses pre-decided IDs directly, enabling the two-pass pipeline.

  • Per-criterion attrition counts for TARGET Item 8: The scout pass (s1a) now computes cumulative person and person-trial counts at each eligibility step. Stored in plan$enrollment_counts[["01"]]$attrition as a long-format data.table with columns trial_id, criterion, n_persons, n_person_trials. $print_target_checklist() Item 8 auto-populates with these counts when available.

swereg 26.3.21

New features

  • $heterogeneity_test(): New method on TTEEnrollment that tests for heterogeneity of treatment effects across trials via a Wald test on the trial_id × exposure interaction (Hernán 2008, Danaei 2013).

  • $print_target_checklist(): New method on TTEPlan that generates a self-contained TARGET Statement (Cashin et al., JAMA 2025) 21-item reporting checklist. Auto-populates items from the study spec and provides [FILL IN] placeholders for PI completion.

Improvements

  • $irr() calendar-time adjustment: Outcome model now includes trial_id as a covariate to adjust for calendar-time variation in outcome rates across enrollment bands (Caniglia 2023, Danaei 2013). Uses ns(trial_id, df=3) for ≥5 unique trial IDs, linear term for 2-4, omitted for 1.

  • $irr() IPW-only guard: $irr() now rejects IPW-only weight columns (ipw, ipw_trunc) after per-protocol censoring has been applied. The swereg pipeline applies per-protocol censoring in $s4_prepare_for_analysis(), so only per-protocol weights (analysis_weight_pp_trunc) are valid for the censored dataset.

Documentation

  • Methodology vignette: New vignette("tte-methodology") maps the swereg TTE implementation to five reference papers (Hernán 2008/2016, Danaei 2013, Caniglia 2023, Cashin 2025). Documents which methods are implemented, which are not, and design rationale.

  • Analysis types: vignette("tte-nomenclature") now documents that swereg supports per-protocol analysis only. ITT analysis is not supported because the pipeline censors at protocol deviation. As-treated analysis requires time-varying IPW (not implemented).

  • period_width documentation: vignette("tte-nomenclature") now explains the enrollment band width / residual immortal time bias trade-off, citing Caniglia (2023) and Hernán (2016).

  • Matching approach: vignette("tte-nomenclature") now documents the per-band stratified matching design choice and alternatives from the literature.

  • $s2_ipw() documentation: Clarified that IPW estimates the propensity score for baseline treatment assignment only, not time-varying treatment weights.

  • $irr() documentation: Documented IRR ≈ HR for rare events, ns(tstop) for flexible baseline hazard, quasipoisson for overdispersion, and computational equivalence to pooled logistic regression.

  • IPCW stabilization: Documented the simplified marginal stabilization approach and its relationship to Danaei (2013).

Tests

  • Added tests for $rates(), $irr(), $km(), $irr() with trial_id, IPW-only guard, and IPCW formula with trial_id.

swereg 26.3.20

Improvements

  • Band-based enrollment: Added explicit isoyearweek ordering before band-level collapse to prevent silent misclassification when input data is not pre-sorted by time.
  • IPCW-PP: Censoring model now includes trial_id to account for calendar-time variation in censoring patterns across enrollment bands.
  • person_weeks: Now computed from actual source row counts during band collapse instead of hardcoded period_width. Partial-coverage bands (e.g., at data boundaries) now contribute accurate person-time.

Breaking changes

  • $irr(): Removed the constant (no time adjustment) Poisson model. Only the flexible model with natural splines (splines::ns(tstop, df=3)) is retained. Output columns renamed: IRR_flexIRR, IRR_flex_lowerIRR_lower, IRR_flex_upperIRR_upper, IRR_flex_pvalueIRR_pvalue, warn_flexwarn. All IRR_const* and warn_const columns removed.
  • tteenrollment_irr_combine(): Updated to match new $irr() output. Columns renamed: IRR (flexible)IRR, 95% CI (flexible)95% CI, p (flexible)p. Constant-model columns removed.
  • TTE ID semantics: The composite person-per-trial identifier column is now called person_trial_id (was trial_id). The actual trial identifier (the enrollment band) is now exposed as trial_id in enrollment output. This fixes the semantics so trial_id means the trial and person_trial_id identifies a person’s participation in a trial.
  • TTEDesign default: id_var default changed from "trial_id" to "person_trial_id".
  • s1_impute_confounders(): No longer hardcodes trial_id; uses design$id_var throughout.

Code quality

  • Rename private methods prepare_outcome and ipcw_pp to s5_prepare_outcome and s6_ipcw_pp to signal their execution order within s4_prepare_for_analysis().
  • Reorder TTEEnrollment public step methods to match their numeric sequence (s1 before s2).

Breaking changes

  • Band-based enrollment: TTEEnrollment enrollment now uses N-week bands (controlled by period_width in TTEDesign, default 4). Calendar time is grouped into bands based on isoyearweek, matching is done per-band (stratified), and data is collapsed to band level during enrollment. This eliminates the separate $s1_collapse() step entirely.

  • Step renumbering: Public workflow methods on TTEEnrollment have been renumbered after removing $s1_collapse():

    • $s2_impute_confounders() -> $s1_impute_confounders()
    • $s3_ipw() -> $s2_ipw()
    • $s4_truncate_weights() -> $s3_truncate_weights()
    • $s5_prepare_for_analysis() -> $s4_prepare_for_analysis()
  • period_width parameter: Moved from TTEPlan$s1_generate_enrollments_and_ipw() to TTEDesign$new(period_width = 4L). Now part of the design contract.

  • isoyearweek column required: Band-based enrollment requires an isoyearweek column in person-week data.

  • Schema version bump: TTEDesign and TTEEnrollment schema versions bumped to 2. Objects saved with version 1 will warn on load.

New features

  • TTEPlan provenance timestamps: TTEPlan now tracks created_at (stamped at construction), registry_study_created_at (from the source RegistryStudy), and skeleton_created_at (from the first skeleton file’s attribute). All three timestamps are shown in print() and print_spec_summary() when available, making it easy to detect stale plans.

  • R6 schema versioning: All R6 classes (RegistryStudy, TTEPlan, TTEDesign, TTEEnrollment) now carry a .schema_version private field, stamped at construction time. A new $check_version() public method compares the stored version against the current class definition and warns when stale. qs2_read() automatically calls $check_version() on R6 objects after loading, so outdated serialized objects produce a clear warning instead of silently breaking.

  • Deprecation warnings for old add_* parameter names: add_diagnoses(diags=), add_operations(ops=), add_rx(rxs=), add_icdo3s(icdo3s=), add_snomed3s(snomed3s=), and add_snomedo10s(snomedo10s=) now emit a deprecation warning when the old parameter name is used. Use codes= instead.

Breaking changes

  • RegistryStudy: register_codes() now takes a declarative signature: register_codes(codes, fn, groups, fn_args, combine_as). Each call declares codes, the function to apply them, which data groups to use, and optional prefix/combine behavior. The old per-type fields (icd10_codes, rx_atc_codes, rx_produkt_codes, operation_codes, icdo3_codes) and the old register_codes(icd10_codes = ...) signature are removed. The single code_registry list field replaces them.

  • summary_table(): The type parameter is removed. The type column is replaced by label. Use label to filter.

  • add_diagnoses(), add_operations(), add_rx(), add_icdo3s(), add_snomed3s(), add_snomedo10s(): The codes parameter is renamed to codes (was diags, ops, rxs, icdo3s, snomed3s, snomedo10s). Old parameter names still work for backwards compatibility.

Refactoring

  • Moved qs2_read() to its own file (R/qs2.R) and inlined the fallback logic directly. Removed pointless .qs_save wrapper (replaced with direct qs2::qs_save calls) and .qs_read internal helper.

Breaking changes

  • skeleton_save() no longer splits batches into sub-files. It saves one file per batch as skeleton_NNN.qs2 (was skeleton_NNN_SS.qs2). The ids_per_file and id_col parameters have been removed.

  • RegistryStudy: batch_sizes parameter (integer vector) replaced with batch_size (single integer, default 1000). The ids_per_skeleton_file parameter has been removed. All batches are now uniform size.

swereg 26.3.21

Breaking changes

File reorganization

  • RENAMED: R/tte_enrollment_r6.RR/r6_tteenrollment.R
  • RENAMED: R/tte_plan_r6.RR/r6_tteplan.R
  • RENAMED: R/registry_study_r6.RR/r6_registry_study.R
  • EXTRACTED: callr_pool() to its own file R/callr_pool.R
  • MOVED: Eligibility helpers to R/skeleton_utils.R
  • MOVED: tteenrollment_impute_confounders() to R/r6_tteenrollment.R

swereg 26.3.20

Breaking changes

  • RENAMED: TTEEnrollment public workflow methods now have step-number prefixes to signal execution order:

    • $collapse()$s1_collapse()
    • $impute_confounders()$s2_impute_confounders()
    • $ipw()$s3_ipw()
    • $truncate()$s4_truncate_weights()
    • $prepare_for_analysis()$s5_prepare_for_analysis()
  • RENAMED: $s4_truncate()$s4_truncate_weights() for clarity.

  • RENAMED: TTEPlan orchestration methods now have step-number prefixes:

    • $generate_enrollments_and_ipw()$s1_generate_enrollments_and_ipw()
    • $generate_analysis_files_and_ipcw_pp()$s2_generate_analysis_files_and_ipcw_pp()
  • RENAMED: Internal worker functions for consistent naming:

    • .tte_process_skeleton().s1_worker()
    • .loop2_worker().s2_worker()
  • REMOVED: Constructor wrapper functions tte_design(), tte_enrollment(), and tte_plan(). Use TTEDesign$new(), TTEEnrollment$new(), and TTEPlan$new() directly. The auto-detection and data-copy logic from tte_enrollment() has been moved into TTEEnrollment$new().

Improvements

  • REFACTOR: Inlined 5 of 6 private helper methods into their single callers on TTEEnrollment (.calculate_ipw, .calculate_ipcw, .combine_weights_fn, .match_ratio, .collapse_periods). Kept .truncate_weights as private (used in 2 places). Reduces indirection for stateless methods that don’t use self.

  • TESTS: Rewrote test-tte_weights.R to test through public API ($s1_collapse(), $s3_ipw(), $s4_truncate(), tte_enrollment(ratio=)) instead of accessing inlined private methods.

swereg 26.3.20

Improvements

  • REFACTOR: Inlined 6 weight/matching functions as private methods on TTEEnrollment (tte_truncate_weights, tte_calculate_ipw, tte_calculate_ipcw, tte_combine_weights, tte_match_ratio, tte_collapse_periods). Removed 2 orphaned functions (tte_identify_censoring, tte_time_to_event). Users access this functionality through R6 methods ($collapse, $ipw, $truncate, etc.).

  • REFACTOR: Consolidated TTE source files from 7 to 2 (+1 rename):

    • tte_design.R + tte_enrollment.R + tte_weights.R merged into tte_enrollment_r6.R (TTEDesign + TTEEnrollment + all weight/matching functions called by their methods)
    • tte_plan.R + tte_spec.R + tte_eligibility.R merged into tte_plan_r6.R (TTEPlan + spec functions + eligibility helpers)
    • registry_study.R renamed to registry_study_r6.R
    • Files containing R6 classes now have _r6 suffix for discoverability
  • REORDER: TTEEnrollment public methods now follow workflow execution order: collapse -> ipw -> impute_confounders -> truncate -> prepare_for_analysis -> extract/summary/diagnostics -> analysis output.

  • DOCS: Added inline comments documenting data flow in generate_enrollments_and_ipw() (Loop 1), .tte_process_skeleton(), private$enroll(), enrollment_spec(), and add_one_ett().

swereg 26.3.18

Improvements

  • MHT: Added rd_approach3b_{single,multiple} exposure variables that collapse estrogen_progesterone_bioidentical and estrogen_progesterone_synthetic into a single estrogen_progesterone level. Derived by relabeling the finished approach3 columns, which is valid because switching between active MHT types never triggers “previous”.

  • MHT: x2026_mht_add_lmed() now creates exposure variables (rd_approach{1,2,3}_{single,multiple}) internally via the new internal helper x2026_mht_create_exposure_variables(). This consolidates all MHT LMED logic in the package, eliminating the need for a separate step 14 in external workflow scripts.

  • MHT: Removed 18 sensitivity columns (*_sensitivity_60p, *_sensitivity_under60censorallat60, *_sensitivity_under60censorrefat65) from x2026_mht_create_exposure_variables(). These had a logic issue where local_or_none_mht rows at age >= 65 produced NA instead of FALSE. The rd_age_continuous column is no longer required as input.

swereg 26.2.27

Improvements

  • VALIDATION: tte_validate_spec() now emits a warning() instead of stop() when spec variables or values are missing from the skeleton. This makes validation informational rather than blocking, useful when working with small data subsets where rare categories may be absent.

swereg 26.2.22

New features

  • EXPORTED: tte_callr_pool() — generic callr::r_bg() worker pool, generalized from the internal .tte_callr_pool(). New API accepts items (list of arg-lists), worker_fn, item_labels, and collect (FALSE to discard results when workers save directly). Eliminates boilerplate when scripts need their own parallel loops (e.g., Loop 2 IPCW-PP).

  • NEW: TTEPlan$generate_analysis_files_and_ipcw_pp() — Loop 2 method that runs per-ETT IPCW-PP calculation and saves analysis-ready files. Mirrors $generate_enrollments_and_ipw() (Loop 1). Parameters: output_dir, estimate_ipcw_pp_separately_by_exposure, estimate_ipcw_pp_with_gam, n_workers, swereg_dev_path.

Improvements

  • MEMORY: tte_calculate_ipcw() now uses mgcv::bam(discrete = TRUE) instead of mgcv::gam() when use_gam = TRUE. bam() discretizes covariates to avoid forming the full model matrix, dramatically reducing peak memory for large datasets. Model objects are also explicitly freed (rm() + gc()) between exposed/unexposed fits.

  • MEMORY: $irr() and $km() now subset to only the columns needed before creating survey::svydesign(). Previously the full data.table (all columns) was copied into the survey object. Model objects and intermediate data are freed between fits.

swereg 26.2.21

Breaking changes

  • RENAMED: $prepare_for_analysis() parameters estimate_ipcw_separately_by_exposureestimate_ipcw_pp_separately_by_exposure and estimate_ipcw_with_gamestimate_ipcw_pp_with_gam for consistency with the IPCW-PP method they control.

  • PRIVATE: $enroll(), $prepare_outcome(), $ipcw_pp(), and $combine_weights() are now private methods on TTEEnrollment.

    • Enrollment: use tte_enrollment(data, design, ratio = 2, seed = 4) instead of tte_enrollment(data, design)$enroll(ratio = 2, seed = 4).
    • Outcome prep + IPCW: use $prepare_for_analysis() (unchanged).
    • Weight combination: handled automatically by $ipcw_pp() (unchanged).
    • Tests can access private methods via enrollment$.__enclos_env__$private$method_name().

swereg 26.2.20

Breaking changes

  • RENAMED: $prepare_analysis()$prepare_for_analysis() on TTEEnrollment. The new name better communicates that this method prepares the enrollment for analysis (it is not the analysis itself).

Bug fixes

  • FIXED: 3 remaining broken test calls (tte_extract(), tte_summary(), tte_weights()) migrated to R6 method syntax ($extract(), print(), $combine_weights()). Column assertion updated: "weight_pp""analysis_weight_pp".

  • FIXED: $impute_confounders() now appends "impute" to steps_completed, consistent with all other mutating methods.

  • FIXED: $ipcw_pp() IPW column guard moved from after IPCW computation to before it (fail-fast).

Documentation

  • FIXED: Vignette truncation bounds corrected from “0.5th and 99.5th percentiles” to “1st and 99th percentiles” (matching code defaults lower = 0.01, upper = 0.99).

  • FIXED: TTEDesign roxygen references to removed tte_match() / tte_expand() replaced with $enroll().

  • FIXED: $weight_summary() moved from “Mutating” to “Non-mutating” section in TTEEnrollment roxygen (it only prints, never modifies data).

swereg 26.2.13

New features

  • NEW: $prepare_for_analysis() method on TTEEnrollment merges $prepare_outcome() + $ipcw_pp() into one step. Parameters: outcome, follow_up, separate_by_exposure, use_gam, censoring_var.

  • NEW: $enrollment_stage active binding on TTEEnrollment. Derives lifecycle stage from existing state: "pre_enrollment""enrolled""analysis_ready". Zero maintenance — reads data_level and steps_completed.

Bug fixes

  • FIXED: 24 broken test cases calling removed standalone functions (tte_enroll(), tte_collapse(), tte_ipw(), tte_truncate(), tte_prepare_outcome()) migrated to R6 method syntax. Error message patterns updated to match method names (e.g., enroll() not tte_enroll()).

swereg 26.2.12

Breaking changes

  • RENAMED: TTETrial class → TTEEnrollment, tte_trial()tte_enrollment(), summary.TTETrialsummary.TTEEnrollment. The class represents an enrollment (matching + panel expansion), not an individual emulated target trial (ETT). Aligns naming with the ETT grid concept in TTEPlan.

swereg 26.2.11

Breaking changes

  • REMOVED: 19 standalone TTE functions moved to R6 methods on TTETrial (15 methods) and TTEPlan (4 methods). Pipe chaining (trial |> tte_ipw()) replaced with $-chaining (trial$ipw()).

    TTETrial methods: $enroll(), $collapse(), $ipw(), $ipcw_pp(), $combine_weights(), $truncate(), $prepare_outcome(), $impute_confounders(), $weight_summary(), $extract(), $summary(), $table1(), $rates(), $irr(), $km().

    TTEPlan methods: $add_one_ett(), $save(), $enrollment_spec(), $generate_enrollments_and_ipw().

  • RENAMED: TTEPlan$task()TTEPlan$enrollment_spec(). The method returns enrollment metadata (design, enrollment_id, age_range, n_threads), not a generic task. The process_fn callback parameter convention changes from function(task, file_path) to function(enrollment_spec, file_path).

    Removed exports: tte_enroll, tte_collapse, tte_ipw, tte_ipcw_pp, tte_weights, tte_truncate, tte_prepare_outcome, tte_extract, tte_summary, tte_weight_summary, tte_table1, tte_rates, tte_irr, tte_km, tte_plan_add_one_ett, tte_plan_save, tte_plan_task, tte_generate_enrollments_and_ipw.

    Kept standalone: tte_rbind(), tte_rates_combine(), tte_irr_combine(), tte_impute_confounders() (thin wrapper for callback default).

  • CHANGED: TTE classes (TTEDesign, TTETrial, TTEPlan) migrated from S7 to R6. Property access changes from @ to $ (e.g., trial@datatrial$data, design@id_vardesign$id_var). R6 reference semantics eliminate copy-on-write overhead from trial$data[, := ...], reducing peak RAM from ~3X to ~2X during the weight-calculation chain (Loop 2).

  • FIXED: Three S7 @ accessor bugs that silently produced no-ops:

    • $ipcw_pp(): dropping intermediate IPCW columns (p_censor, etc.)
    • $collapse(): creating person_weeks column
    • $impute_confounders(): deleting old confounder columns before merge All fixed automatically by R6 (in-place modification works).
  • CHANGED: $ipcw_pp() now inlines weight combination and truncation (was calling tte_combine_weights() and tte_truncate_weights() via function parameters that created extra refcount). Keeps data.table refcount=1 throughout.

File reorganization

  • Split tte_classes.R and tte_methods.R into per-class files with methods inline: tte_design.R, tte_trial.R, tte_plan.R. tte_generate.R reduced to thin tte_impute_confounders() wrapper + .tte_callr_pool() helper.

  • Added S3method(summary, TTETrial) → delegates to $summary().

Dependencies

  • ADDED: R6 package to Imports (S7 retained for skeleton classes).

swereg 26.2.10

Bug fixes

  • FIXED: tte_ipw(), tte_ipcw_pp(): in-place joins via S7 @ accessor now use extract/modify/reassign pattern (dt <- trial@data; dt[...]; trial@data <- dt). The previous trial@data[i, := ...] silently modified a copy, leaving the S7 object’s data unchanged.

Performance

  • IMPROVED: tte_ipw(), tte_ipcw_pp(), tte_calculate_ipcw(): replace merge() with in-place keyed joins (data[i, := ...]), reducing peak RAM from ~3x to ~2x panel size during the weight-calculation chain.

Breaking changes

  • CHANGED: tte_ipcw_pp() now also combines weights (ipw * ipcw_ppanalysis_weight_pp), truncates analysis_weight_pp, and drops intermediate IPCW columns (p_censor, p_uncensored, cum_p_uncensored, marginal_p, cum_marginal). Callers no longer need tte_weights() + tte_truncate() after tte_ipcw_pp().

  • RENAMED: tte_generate_enrollments()tte_generate_enrollments_and_ipw(). Now computes IPW + truncation once on the full combined enrollment (after imputation), so the per-ETT Loop 2 no longer needs to call tte_ipw(). New stabilize parameter (default TRUE) controls IPW stabilization.

New features

  • NEW: tte_plan_load() reads a .qs2 plan file and reconstructs the TTEPlan S7 object. Companion to tte_plan_save().

  • CHANGED: tte_plan_save() now persists project_prefix and skeleton_files alongside ett and global_max_isoyearweek, so tte_plan_load() can fully reconstruct the object.

  • NEW: skeleton_process() gains n_workers parameter for parallel batch processing. When > 1, uses callr::r() + parallel::mclapply() to process batches concurrently while avoiding fork() + data.table OpenMP segfaults.

swereg 26.2.9

Improvements

  • CHANGED: Migrate serialization from qs (archived) to qs2. .qs_save/.qs_read wrappers now call qs2::qs_save/qs2::qs_read (standard format, preserves S7 objects). All file extensions changed from .qs to .qs2. The preset parameter is no longer used.

  • IMPROVED: tte_rates() now sets swereg_type and exposure_var attributes on its output; tte_irr() sets swereg_type.

  • RENAMED: tte_rates_table()tte_rates_combine(), tte_irr_table()tte_irr_combine(). New API accepts (results, slot, descriptions) — extracts the rates/irr slot internally, removing the need for lapply(results, [[, "table2") at call sites. Exposure column is now read from the exposure_var attribute instead of guessing via setdiff().

Breaking changes

  • CHANGED: tte_plan_add_one_ett() now requires explicit enrollment_id parameter. Auto-assignment based on follow_up + age_group removed. Validation that design params match within an enrollment_id is preserved.

  • IMPROVED: print(plan) now shows both enrollment grid and full ETT grid.

  • CHANGED: tte_plan_add_one_ett() bundles age_group, age_min, age_max, person_id_var into an argset named list parameter. time_exposure_var and eligible_var no longer have defaults (must be explicit). exposure_var removed from interface (hardcoded to "baseline_exposed").

  • RENAMED: file_id column in the ett data.table → enrollment_id. This makes explicit that ETTs sharing the same follow_up + age_group are processed together as one “enrollment” (shared eligibility, matching, collapse, imputation).

  • RENAMED: tte_generate_trials()tte_generate_enrollments(). The function generates enrollments (one per follow_up × age_group), not individual trials.

  • RENAMED: tte_plan_task() return list key file_idenrollment_id.

  • UPDATED: print(plan) now shows “Enrollments: N x M skeleton files” instead of “Tasks: N file_id(s) x M skeleton files”.

swereg 26.2.8

Breaking changes

  • CHANGED: tte_plan() is now infrastructure-only — takes only project_prefix, skeleton_files, global_max_isoyearweek. Use tte_plan_add_one_ett() to add ETTs with per-ETT design parameters.

  • REMOVED: TTEPlan plan-level properties confounder_vars, person_id_var, exposure_var, time_exposure_var, eligible_var. These are now per-ETT columns in the ett data.table.

  • REMOVED: Internal .tte_grid() function. The ETT grid is now built incrementally via tte_plan_add_one_ett().

  • ADDED: TTEPlan@project_prefix property (needed for file naming in tte_plan_add_one_ett()).

New features

  • NEW: tte_plan_add_one_ett() — builder function that adds one ETT row to a plan. Stores design params (confounder_vars, person_id_var, exposure_var, time_exposure_var, eligible_var) per-ETT, allowing different ETTs to use different confounders. Validates that design params match within an enrollment_id (same follow_up + age_group).

  • RENAMED: TTEPlan@files property → TTEPlan@skeleton_files for clarity.

swereg 26.2.7

Breaking changes

  • REFACTORED: tte_generate_enrollments() (formerly tte_generate_trials()) now takes a TTEPlan object instead of separate parameters (ett, files, confounder_vars, global_max_isoyearweek). The process_fn callback signature changes from function(file_path, design, file_id, age_range, n_threads) to function(task, file_path) where task is a list with design, enrollment_id, age_range, and n_threads.

New features

  • NEW: TTEPlan S7 class bundles ETT grid, skeleton file paths, confounder definitions, and design column names into a single object for trial generation.
    • tte_plan(): Constructor function
    • tte_plan_task(plan, i): Extract the i-th enrollment task as a list with design, enrollment_id, age_range, n_threads
    • plan[[i]]: Shorthand for tte_plan_task(plan, i)
    • length(plan): Number of unique enrollment_id groups
    • Supports interactive testing: task <- plan[[1]]; process_fn(task, plan@skeleton_files[1])

swereg 26.2.6

Documentation

  • FIXED: Add missing topics to pkgdown reference index (TTEDesign, TTETrial, x2026_mht_add_lmed)

swereg 26.2.5

Bug fixes

  • FIXED: Set eval = FALSE in skeleton3-analyze vignette to prevent build errors from optional qs package dependency

swereg 26.2.4

Bug fixes

  • FIXED: Remove qs from Suggests to fix GitHub Actions CI (package not available on CRAN)

swereg 26.2.3

Breaking changes

  • REPLACED: tte_match() and tte_expand() merged into single tte_enroll() function:
    • Old workflow: tte_trial(data, design) |> tte_match(ratio = 2, seed = 4) |> tte_expand(extra_cols = "isoyearweek")
    • New workflow: tte_trial(data, design) |> tte_enroll(ratio = 2, seed = 4, extra_cols = "isoyearweek")
    • The two operations were tightly coupled and always used together
    • tte_enroll() combines sampling (matching) and panel expansion in one step
    • Records “enroll” in steps_completed (previously recorded “match” then “expand”)

New features

  • NEW: Trial eligibility helper functions for composable eligibility criteria:
    • tte_eligible_isoyears(): Check eligibility based on calendar years
    • tte_eligible_age_range(): Check eligibility based on age range
    • tte_eligible_no_events_in_window_excluding_wk0(): Check for no events in prior window (correctly excludes baseline week)
    • tte_eligible_no_observation_in_window_excluding_wk0(): Check for no specific value in prior window (for categorical variables)
    • tte_eligible_combine(): Combine multiple eligibility columns using AND logic
    • All functions modify data.tables by reference and return invisibly for method chaining

Documentation

  • IMPROVED: Clarified that eligibility checks should EXCLUDE the baseline week. Using cumsum(x) == 0 is incorrect because it includes the current week. The new eligibility functions use any_events_prior_to() which correctly excludes the current row.

swereg 26.1.31

New features

  • NEW: S7 object-oriented API for target trial emulation workflows:
    • TTEDesign class: Define column name mappings once and reuse across all TTE functions
    • TTETrial class: Fluent method chaining with workflow state tracking
    • tte_design() / tte_trial(): Constructor functions for the S7 classes
    • tte_match(), tte_expand(), tte_collapse(), tte_ipw(): S7 methods for data preparation
    • tte_prepare_outcome(), tte_ipcw(): Outcome-specific per-protocol analysis
    • tte_weights(), tte_truncate(): Weight combination and truncation
    • tte_rbind(): Combine batched trial objects
    • tte_extract(), tte_summary(): Access data and diagnostics
    • tte_table1(), tte_rates(), tte_irr(), tte_km(): Analysis and visualization

Breaking changes

  • REMOVED: Deprecated S7 methods replaced by tte_prepare_outcome():
    • tte_tte(): Use tte_prepare_outcome() which computes weeks_to_event internally
    • tte_set_outcome(): Use tte_prepare_outcome(outcome = "...") instead
    • tte_censoring(): Use tte_prepare_outcome() which handles censoring internally

Dependencies

  • ADDED: S7 package to Imports for object-oriented class system

swereg 26.1.30

New features

  • NEW: Target trial emulation weight functions for causal inference in observational studies:
    • tte_calculate_ipw(): Calculate stabilized inverse probability of treatment weights (IPW) for baseline confounding adjustment using propensity scores
    • tte_calculate_ipcw(): Calculate time-varying inverse probability of censoring weights (IPCW) for per-protocol analysis using GAM or GLM
    • tte_identify_censoring(): Identify protocol deviation and loss to follow-up for per-protocol analysis
    • tte_combine_weights(): Combine IPW and IPCW weights for per-protocol effect estimation
    • tte_truncate_weights(): Truncate extreme weights at specified quantiles to reduce variance
  • NEW: Target trial emulation data preparation functions:
    • tte_match_ratio(): Sample comparison group at specified ratio (e.g., 2:1 unexposed to exposed)
    • tte_collapse_periods(): Collapse fine-grained time intervals (e.g., weekly) to coarser periods (e.g., 4-week)
    • tte_time_to_event(): Calculate time to first event for each trial/person

Dependencies

  • ADDED: mgcv package to Imports for flexible GAM-based censoring models in tte_calculate_ipcw()

swereg 25.12.24

API changes

New features

  • NEW: any_events_prior_to() function for survival analysis:
    • Checks if any TRUE values exist in a preceding time window (excludes current row)
    • Useful for determining if an event occurred in a prior time period
    • Default window of 104 weeks (~2 years) with customizable size
    • Complements steps_to_first() for comprehensive time-to-event analysis
  • ENHANCED: steps_to_first() function improvements:
    • Renamed parameter from window to window_including_wk0 for clarity
    • Default window is now 104 (inclusive of current week)
    • Added @family survival_analysis tag and cross-reference to any_events_prior_to()

Bug fixes

  • FIXED: Added slider package to Imports in DESCRIPTION to fix R CMD check warning about undeclared import

Data

  • BREAKING: Replaced separate fake_inpatient_diagnoses and fake_outpatient_diagnoses with unified fake_diagnoses dataset:
    • New SOURCE column identifies data origin: “inpatient”, “outpatient”, or “cancer”
    • ~2000 inpatient records, ~2000 outpatient records, ~1000 cancer records
    • Cancer records always have populated ICDO3 codes
    • Enables testing of source-based filtering and validation
  • ENHANCED: Added ICD-O-3 and SNOMED-CT columns to fake diagnosis data:
    • ICDO3: ICD-O-3 morphology codes (always populated for cancer source)
    • SNOMED3: SNOMED-CT version 3 codes
    • SNOMEDO10: SNOMED-CT version 10 codes

Validation

  • ENHANCED: SOURCE column validation is now optional - filter externally if needed (see API changes above)

Documentation

swereg 25.12.6

New features

  • NEW: steps_to_first() function for survival analysis:
    • Calculates the number of steps (e.g., weeks) until the first TRUE value in a forward-looking window
    • Useful for time-to-event calculations in longitudinal registry data
    • Default window of 103 weeks (~2 years) with customizable size
    • Returns NA if no event occurs within the window

Bug fixes

  • CRITICAL: Fixed add_snomed3s() and add_snomedo10s() calling non-existent internal functions
    • Both functions now correctly call add_diagnoses_or_operations_or_cods_or_icdo3_or_snomed()
    • These functions would have caused runtime errors before this fix
  • FIXED: Removed erroneous icdo10 column references from add_diagnoses():
    • ICD-O only has editions 1, 2, and 3 (not 10)
    • ICD-O-3 codes should be handled via the dedicated add_icdo3s() function
  • FIXED: Added icd7* and icd9* columns to diagnosis search in add_diagnoses():
    • Historical ICD-7 and ICD-9 columns are now properly searched when diag_type = "both"
    • Validation and helper function now consistent
  • FIXED: Corrected error messages in add_icdo3s(), add_snomed3s(), and add_snomedo10s():
    • Messages now correctly reference the appropriate data types instead of “operation data”

Documentation

  • ENHANCED: add_diagnoses() documentation now clearly lists which diagnosis columns are searched:
    • When diag_type = "both": hdia, dia*, ekod*, icd7*, icd9*
    • When diag_type = "main": hdia only

swereg 25.8.19

CRAN Submission Preparation

  • CRAN READY: Package prepared for CRAN submission with comprehensive compliance improvements:
    • Fixed DESCRIPTION file author field duplication issue
    • Updated .Rbuildignore to exclude all development files (docs/, .git/, .Rhistory, etc.)
    • Removed non-portable files (@eaDir directories, .DS_Store files)
    • Added missing global variable declarations to prevent R CMD check warnings
    • Verified URL consistency between DESCRIPTION and package startup messages
  • OPTIMIZED: Vignette structure significantly improved for CRAN submission:
    • Reduced total vignette content by 31% (626 lines removed)
    • Condensed cookbook-survival-analysis.Rmd (removed verbose descriptive statistics and redundant sections)
    • Simplified skeleton2-clean.Rmd (removed duplicated skeleton1_create workflow)
    • Streamlined skeleton3-analyze.Rmd (removed redundant data loading and best practices sections)
    • Fixed all vignette build errors by ensuring consistent data variable availability
    • All vignettes now compile successfully and use package synthetic data consistently
  • VALIDATED: All examples are runnable using package fake data - no \dontrun sections without justification

Code Quality Improvements

  • CONSISTENCY: Fixed date_columns parameter usage throughout package:
    • Updated all vignettes to use lowercase date_columns parameters (e.g., “indatum” instead of “INDATUM”)
    • Added warning to make_lowercase_names() function when uppercase date_columns are provided
    • Enhanced documentation to clarify that date_columns should use lowercase names
    • Improved user experience with clear guidance and automatic handling of uppercase inputs
  • ELEGANCE: Enhanced vignette code patterns for better readability:
    • Replaced verbose data() loading patterns with elegant pipe syntax
    • Updated all data loading to use swereg::fake_* |> copy() |> make_lowercase_names() pattern
    • Eliminated clumsy multi-step data preparation code throughout vignettes
    • Improved code flow and professional appearance of package examples
  • VERIFIED: Package builds successfully with R CMD build and passes CRAN compliance checks
  • CONFIRMED: inst/ directory contains only files referenced by package functions

swereg 25.7.30

New Features

  • NEW: make_rowind_first_occurrence() helper function for rowdep → rowind transformations:
    • Simplifies the common pattern of creating row-independent variables from first occurrence of conditions
    • Automatically handles temp variable creation and cleanup
    • Uses first_non_na() for robust aggregation across all variable types
    • Includes comprehensive input validation and clear error messages
  • NEW: “Understanding rowdep and rowind Variables” vignette:
    • Explains the fundamental distinction between row-dependent and row-independent variables
    • Demonstrates common transformation patterns with practical examples
    • Shows integration with the swereg workflow (skeleton1_create → skeleton2_clean → skeleton3_analyze)
    • Includes best practices for longitudinal registry data analysis

Documentation

  • ENHANCED: Helper functions now include @family data_integration tags for better organization
  • IMPROVED: Function examples use existing fake datasets for consistency

swereg 25.7.16

New Swedish Date Parsing and Enhanced Data Cleaning

  • NEW: parse_swedish_date() function for handling Swedish registry dates with varying precision:
    • Handles 4-character (YYYY), 6-character (YYYYMM), and 8-character (YYYYMMDD) formats
    • Automatically replaces “0000” with “0701” and “00” with “15” for missing date components
    • Supports custom defaults for missing date parts
    • Includes comprehensive error handling and vectorized processing
  • ENHANCED: make_lowercase_names() now supports automatic date cleaning:
    • New date_column parameter to specify which column contains dates
    • Automatically creates cleaned ‘date’ column using parse_swedish_date()
    • Works with both default and data.table methods
    • Maintains backward compatibility with existing code
  • ENHANCED: All add_* functions now require cleaned date columns:
  • ENHANCED: create_skeleton() now includes personyears column:
    • Annual rows (is_isoyear==TRUE) have personyears = 1
    • Weekly rows (is_isoyear==FALSE) have personyears = 1/52.25
    • Facilitates person-time calculations for survival analysis
  • IMPROVED: Survival analysis cookbook vignette updated:
    • Uses weekly data instead of yearly data for more precise analyses
    • Age calculation based on isoyearweeksun instead of isoyear
    • Includes person-time in descriptive statistics
    • Demonstrates proper use of new date cleaning workflow

Enhanced error handling and validation

  • ENHANCED: Comprehensive input validation for all add_* functions:
    • add_onetime(): Validates skeleton structure, ID column exists, checks for ID matches
    • add_annual(): Validates isoyear parameter, checks skeleton year coverage
    • add_diagnoses(): Validates diagnosis patterns, checks for diagnosis code columns
    • add_operations(): Validates operation patterns, checks for operation code columns
    • add_rx(): Validates prescription data structure, checks source columns
    • add_cods(): Validates death data structure, checks cause of death columns
  • IMPROVED: User-friendly error messages with specific guidance:
    • Clear indication when make_lowercase_names() is forgotten
    • Helpful suggestions for column naming issues
    • Informative ID mismatch diagnostics with sample values
  • NEW: Internal validation helper functions for consistent error handling
  • ADDED: Input validation for pattern lists, data structures, and parameter ranges

New cookbook documentation

  • NEW: Comprehensive survival analysis cookbook (cookbook-survival-analysis.Rmd):
    • Complete workflow from raw data to Cox proportional hazards model
    • Time-varying covariates (annual income) with heart attack outcome
    • Handles common challenges: missing data, multiple events, competing risks
    • Performance tips for large datasets
    • Practical solutions for real-world registry analysis
  • ENHANCED: Updated _pkgdown.yml with new “Cookbooks” section
  • ADDED: survival package to Suggests dependencies

Bug fixes

  • FIXED: Improved ID matching warnings and error messages across all functions
  • CORRECTED: Better handling of missing data in time-varying covariate analysis
  • ENHANCED: More robust parameter validation prevents common user errors

swereg 25.7.16

Major documentation restructuring

  • RESTRUCTURED: Complete vignette reorganization for clear learning progression:
    • NEW “Skeleton concept” vignette: Conceptual foundation explaining the skeleton approach without technical implementation
    • “Building the data skeleton (skeleton1_create)”: Pure data integration focus - raw data to time-structured skeleton
    • “Cleaning and deriving variables (skeleton2_clean)”: Pure data cleaning and variable derivation focus
    • “Production analysis workflows (skeleton3_analyze)”: Memory-efficient processing and final analysis datasets
  • IMPROVED: Clear separation of concerns with focused, single-purpose tutorials
  • ENHANCED: Systematic learning progression from concept to implementation to production
  • UPDATED: _pkgdown.yml structure with logical vignette grouping
  • PRESERVED: All existing technical content while improving organization

Content improvements

  • NEW: Comprehensive conceptual introduction based on presentation content
  • IMPROVED: Each vignette builds systematically on the previous one
  • ENHANCED: Better explanation of three types of data integration (one-time, annual, event-based)
  • CLARIFIED: Production workflow patterns with memory-efficient batching strategies
  • STANDARDIZED: Consistent academic tone and sentence case throughout

swereg 25.7.15

Documentation and presentation improvements

  • STANDARDIZED: Changed all titles and headings to normal sentence case throughout:
    • Vignette titles: “Basic Workflow” → “Basic workflow”, “Complete Workflow” → “Complete workflow”, etc.
    • README.md section headings: “Core Functions” → “Core functions”, “Data Integration” → “Data integration”, etc.
    • NEWS.md section headings: “Vignette Restructuring” → “Vignette restructuring”, etc.
    • CLAUDE.md section headings: “Project Overview” → “Project overview”, “Development Commands” → “Development commands”, etc.
  • IMPROVED: Consistent normal sentence case for better readability and less formal appearance
  • SIMPLIFIED: Removed subtitle text after colons in vignette titles for cleaner presentation
  • ENHANCED: Improved Core Concept section in basic workflow vignette with clear explanation of three data types:
    • One-time data (demographics): Added to all rows for each person
    • Annual data (income, family status): Added to all rows for specific year
    • Event-based data (diagnoses, prescriptions, deaths): Added to rows where events occurred
  • CLARIFIED: Step 1 documentation now properly explains all skeleton columns including isoyearweeksun
  • VERIFIED: All vignettes compile successfully with improved content

Major documentation and vignette reorganization

  • RESTRUCTURED: Complete vignette reorganization with improved naming and content flow:
    • swereg.Rmdbasic-workflow.Rmd: Focused introduction to skeleton1_create
    • advanced-workflow.Rmdcomplete-workflow.Rmd: Two-stage workflow (skeleton1_create + skeleton2_clean)
    • memory-efficient-batching.Rmd: Maintained as comprehensive three-stage workflow guide
  • IMPROVED: Eliminated content redundancy between vignettes for clearer learning progression
  • ENHANCED: Updated _pkgdown.yml configuration to reflect new vignette structure

Function documentation improvements

  • ENHANCED: Comprehensive documentation improvements for all exported functions:
    • Added @family tags for logical grouping (data_integration, skeleton_creation, data_preprocessing)
    • Added @seealso sections with cross-references to related functions and vignettes
    • Replaced placeholder examples with runnable code using synthetic data
    • Improved parameter documentation with detailed descriptions and expected formats
    • Enhanced return value documentation with explicit side effects description
  • STANDARDIZED: Consistent academic tone throughout all documentation

Professional presentation updates

  • IMPROVED: Removed informal elements and adopted academic tone across all documentation
  • UPDATED: Changed terminology from “fake data” to “synthetic data” throughout
  • ENHANCED: More professional language in README.md and vignettes
  • STANDARDIZED: Consistent formal tone appropriate for scientific software

Technical improvements

  • VERIFIED: All vignettes compile successfully with updated content
  • TESTED: Package passes R CMD check with all documentation improvements
  • UPDATED: CLAUDE.md reflects new vignette structure and documentation standards

swereg 25.7.1

Vignette restructuring

  • RESTRUCTURED: Reorganized vignettes for clearer learning progression:
    • swereg.Rmd: Clean skeleton1_create tutorial using full datasets (removed subset filtering)
    • advanced-workflow.Rmd: Focused skeleton1→skeleton2 workflow (removed batching and skeleton3 content)
    • memory-efficient-batching.Rmd: NEW comprehensive batching vignette with complete skeleton1→skeleton2→skeleton3 workflow for large-scale studies
  • IMPROVED: GitHub Actions workflow optimization with dependency caching and binary packages for faster CI/CD

Batching vignette fixes

  • FIXED: Updated memory-efficient-batching vignette with production-ready improvements:
    • Replace split() with csutil::easy_split for better batch handling
    • Replace saveRDS/readRDS with qs::qsave/qread for 2-10x faster file I/O
    • Fix skeleton3_analyze to properly aggregate weekly→yearly data using swereg::max_with_infinite_as_na
    • Remove incorrect is_isoyear == TRUE filter in skeleton3_analyze
    • Fix analysis results to avoid NaN outputs in treatment rate calculations
    • Add explanations for weekly→yearly data aggregation and qs package performance benefits

New features

  • NEW: Added isoyearweeksun variable to create_skeleton() function - provides Date representing the Sunday (last day) of each ISO week/year for easier date calculations
  • NEW: Updated package logo
  • IMPROVED: Updated all vignettes to not assume swereg is loaded - all functions use swereg:: prefix and data() calls use package="swereg" argument
  • IMPROVED: Updated function documentation to clarify that pattern matching functions (add_diagnoses, add_cods, add_rx) automatically add “^” prefix - users should NOT include “^” in their patterns
  • NEW: Added comprehensive fake Swedish registry datasets for development and vignettes:
    • fake_person_ids: 1000 synthetic personal identifiers
    • fake_demographics: Demographics data matching SCB format
    • fake_annual_family: Annual family status data
    • fake_inpatient_diagnoses and fake_outpatient_diagnoses: NPR diagnosis data with ICD-10 codes
    • fake_prescriptions: LMED prescription data with ATC codes and hormone therapy focus
    • fake_cod: Cause of death data
  • NEW: Added two comprehensive vignettes:
    • swereg.Rmd: Basic skeleton1_create workflow tutorial
    • advanced-workflow.Rmd: Complete 3-phase workflow (skeleton1 → skeleton2 → skeleton3)
  • NEW: Replaced magrittr pipe (%>%) with base pipe (|>) throughout codebase
  • NEW: Added memory-efficient batched processing examples for large registry studies

Bug fixes

  • CRITICAL: Fixed incorrect variable names in fake_cod dataset - changed from non-Swedish underlying_cod/contributory_cod1/contributory_cod2 to correct Swedish registry names ulorsak/morsak1/morsak2
  • VERIFIED: Confirmed all fake datasets use correct Swedish registry variable name conventions
  • VERIFIED: All ICD-10 and ATC codes in fake datasets are properly formatted and realistic

Documentation improvements

Package structure

  • All exported functions now have complete, accurate documentation suitable for CRAN submission
  • Documentation focuses on Swedish registry data analysis workflows
  • Examples use \dontrun{} appropriately for functions requiring external data