swereg 26.4.2
Bug Fixes
- Fix deadlock in
callr_pool()when worker results exceed the Unix socket buffer (208KB default). Workers in.s1a_workerand.s1b_workernow write results to tempfiles instead of returning them through the socket. The main process reads and cleans up the tempfiles. This prevents the worker from blocking onsend()while the main process waits on the poll connection.
swereg 26.3.30
Improvements
-
callr_pool()gains atimeout_minutesparameter (default: 30). If a work item runs longer than the timeout, its worker is killed and the item is retried once. If the retry also times out,callr_pool()callsstop(). Disable withtimeout_minutes = NULL.
CRAN compliance
- Move
mgcvfrom Imports to Suggests (only used conditionally viarequireNamespace()). - Add
@importFromforprogressrandutils::getFromNamespaceto satisfy NAMESPACE checks. - Replace
swereg:::calls withgetFromNamespace()in callr worker sessions. - Replace
assign(..., globalenv())with a package-level environment (.swereg_env). - Add
var <- NULLdeclarations for all data.table NSE variables. - Add
.vscodeto.Rbuildignore.
swereg 26.3.23
Improvements
-
callr_pool()workers now self-terminate if the parent R session dies (e.g. OOM kill). Each worker spawns a lightweight shell watchdog that polls the parent PID every 5 seconds. Previously, orphaned workers ran indefinitely until manually cleaned up viacallr_kill_workers().
Bug Fixes
Critical:
.s1_eligible_tuples()usedfirst(rd_exposed)to classify exposure at each trial period, which only detected MHT initiation if it happened on the first week of a 4-week trial period. Withperiod_width = 4, ~75% of exposed people start MHT mid-period and were silently dropped — their first trial period showed them as unexposed (week 1 was pre-initiation), and the next period excluded them for prior MHT. Fixed by usingany(rd_exposed, na.rm = TRUE)instead. The existingno_prior_exposureexclusion correctly handles the new-user restriction. Verified: eligible exposed count on skeleton_001 went from 19 → 84, matching the old per-week pipeline..s1_compute_attrition(): exposure classification now usesany()per person-trial instead of checking the first eligible row. Aligns attrition reporting with theany()fix in.s1_eligible_tuples()— previously the attrition flow underreported exposed counts by ~4x.tteplan_validate_spec(): missing variables (confounders, outcomes, exclusion criteria, exposure) nowstop()instead ofwarning(). Previously, a misspelled or renamed variable would silently pass validation and break downstream (e.g. IPW model missing a confounder). Category mismatches (values in spec but not data) remain as warnings since they can occur in small batches.
swereg 26.3.20
Bug Fixes
.s1_compute_attrition(): fix undercounting of person-trials for row-level eligibility criteria (e.g.eligible_valid_exposure). The old code checked only the first row per person-trial, missing cases where exposure onset occurred after the first week. The new approach filters to eligible rows first, then counts — matching the logic used by.s1_eligible_tuples()..s1_compute_attrition(): fix negative exposed/comparator deltas in participant flow. Thebefore_exclusionsbaseline now classifies exposure from the first row with non-NA exposure per person-trial, rather than the first overall row (which often hasrd_exposed = NA). Total person-trial counts remain unfiltered.
Performance
- TTE s1 pipeline: add
data.table::setkey()calls to eliminate redundant hash-based grouping. Skeleton reads in.s1_prepare_skeleton()and.s1b_worker()now set key on(id, isoyearweek)(metadata-only, no re-sort).enroll()Phase B collapse uses keyed grouping on(pid, trial_id), and Phase D panel expansion uses keyed binary join instead ofmerge().
Bug Fixes
callr_pool()PID files now written to/tmpinstead oftempdir()so that orphaned workers from crashed R sessions can be discovered and cleaned up by new sessions.callr_kill_workers()simplified to orphan-only cleanup: kills workers whose parent R process is dead and removes stale PID files. Own-session cleanup is already handled bycallr_pool()’son.exit()handler; this function is only needed after hard crashes (SIGKILL, OOM).
Performance
callr_pool()now uses persistentcallr::r_sessionworkers instead of spawning a freshcallr::r_bg()process per work item. The swereg namespace is loaded once per worker slot rather than once per item, eliminating redundant startup overhead when scaling to large numbers of items.Orphan protection:
callr_pool()writes a PID file per invocation and cleans up orphaned worker sessions from previous crashed runs (e.g. OOM kills) on the next invocation.
Bug Fixes
- Fixed 3 test failures in
test-tte_spec.Rcaused by s1 pipeline changes: added missingrd_exposedcolumn to.s1_compute_attritiontest fixtures, addedn_exposed/n_unexposedto mock attrition data, and updated matching output expectations.
Performance
s1_generate_enrollments_and_ipw()now caches prepared skeletons between s1a (scout) and s1b (enrollment) passes, eliminating redundant file reads and exclusion processing. Expected ~30-40% reduction in per-enrollment wall-clock time..s1b_worker()now subsets the skeleton to enrolled persons before computing derived confounders, avoiding expensive rolling-window operations on non-enrolled persons.TTEEnrollment$new()acceptsown_data = TRUEto skip the defensivedata.table::copy()when the caller will not reuse the data. Used in.s1b_worker()where the skeleton is discarded immediately after.enroll()Phase B now aggregates confounders, time-exposure, and outcome columns in a single groupby pass instead of four separate passes with merges.
Improvements
“Valid exposure” (
eligible_valid_exposure) is now the first exclusion criterion in the TTE attrition flow. Rows whererd_exposedis NA are explicitly accounted for rather than silently disappearing between the before-exclusions total and the first real criterion.TARGET Item 8 (participant flow) now shows a richer flow diagram with before-exclusion counts, per-step exposed/unexposed breakdown, delta (excluded) and remaining counts at each criterion, right-justified aligned columns, and color-coded output (red for exclusions, cyan for remaining). Post-matching line also reformatted with arrow indicator. “Before exclusions” line no longer shows a meaningless exposed/comparator breakdown.
enrollment_counts$attritionnow includesn_exposedandn_unexposedcolumns and a"before_exclusions"row.
Bug Fixes
Fixed
trial_idmissing error caused byattr<-breaking data.table’s internal self-reference. Replaced withdata.table::setattr()in.s1_prepare_skeleton()andtteplan_apply_exclusions()to preserve in-place modification semantics.Fixed callr worker stale-namespace bug: after
devtools::load_all()in a subprocess, worker functions still referenced the old (installed) swereg namespace. Now rebinds the worker function’s environment to the freshly-loaded namespace.
Improvements
Reorganized
print_spec_summary()header layout: renamed “Study created” → “RegistryStudy”, merged “Skeletons created” + “Skeleton files” into a single nested line with tree connector, renamed “Plan created” → “TTEPlan”, and reordered to follow data pipeline order.Rewrote TARGET checklist items 6c, 6h, and 7a-h in
print_target_checklist()as academic prose suitable for copy-pasting into a methods section. Item 6c now dynamically reflects per-enrollment matching ratios from the spec.
Breaking changes
enrollment_countsstructure changed: Each element ofTTEPlan$enrollment_countsis now a list with$attritionand$matchingsub-elements (was a single data.table). Code accessingplan$enrollment_counts[["01"]]directly as a data.table must update toplan$enrollment_counts[["01"]]$matching.person_trial_idrenamed toenrollment_person_trial_id: The composite key column now has a 3-part name matching its 3-part format (enrollment_id.person_id.trial_id). All code referencingperson_trial_idmust be updated.process_fnparameter removed from$s1_generate_enrollments_and_ipw(): The two-pass spec-driven pipeline is now the only code path.self$specis required (create plans withtteplan_from_spec_and_registrystudy()). The legacy single-pass.s1_worker()has been deleted..s2_worker()renamed to.s3_worker(): Internal Loop 2 IPCW-PP worker renamed to avoid confusion with the two-pass Loop 1 pipeline.
New features
-
Two-pass enrollment pipeline:
$s1_generate_enrollments_and_ipw()now uses a two-pass pipeline that fixes cross-batch matching ratio imbalance:-
Pass 1a (scout): Lightweight parallel pass collecting eligible
(person_id, trial_id, exposed)tuples from all batches. -
Centralized matching: Combines all tuples and performs per-
trial_idmatching globally, ensuring the correct ratio across all batches. - Pass 1b (full enrollment): Parallel pass using pre-matched IDs to enroll without per-batch matching.
-
Pass 1a (scout): Lightweight parallel pass collecting eligible
enrollment_countson TTEPlan: New field storing per-trial matching counts (total vs enrolled, exposed vs unexposed) for TARGET Item 8 reporting..assign_trial_ids(): New shared helper function that is the single source of truth forisoyearweek -> trial_idmapping. Used consistently by both scout (s1a) and enrollment (s1b/enroll) phases.enrolled_idsparameter onTTEEnrollment$new(): When provided, enrollment skips the matching phase and uses pre-decided IDs directly, enabling the two-pass pipeline.Per-criterion attrition counts for TARGET Item 8: The scout pass (s1a) now computes cumulative person and person-trial counts at each eligibility step. Stored in
plan$enrollment_counts[["01"]]$attritionas a long-format data.table with columnstrial_id,criterion,n_persons,n_person_trials.$print_target_checklist()Item 8 auto-populates with these counts when available.
swereg 26.3.21
New features
$heterogeneity_test(): New method onTTEEnrollmentthat tests for heterogeneity of treatment effects across trials via a Wald test on thetrial_id × exposureinteraction (Hernán 2008, Danaei 2013).$print_target_checklist(): New method onTTEPlanthat generates a self-contained TARGET Statement (Cashin et al., JAMA 2025) 21-item reporting checklist. Auto-populates items from the study spec and provides[FILL IN]placeholders for PI completion.
Improvements
$irr()calendar-time adjustment: Outcome model now includestrial_idas a covariate to adjust for calendar-time variation in outcome rates across enrollment bands (Caniglia 2023, Danaei 2013). Usesns(trial_id, df=3)for ≥5 unique trial IDs, linear term for 2-4, omitted for 1.$irr()IPW-only guard:$irr()now rejects IPW-only weight columns (ipw,ipw_trunc) after per-protocol censoring has been applied. The swereg pipeline applies per-protocol censoring in$s4_prepare_for_analysis(), so only per-protocol weights (analysis_weight_pp_trunc) are valid for the censored dataset.
Documentation
Methodology vignette: New
vignette("tte-methodology")maps the swereg TTE implementation to five reference papers (Hernán 2008/2016, Danaei 2013, Caniglia 2023, Cashin 2025). Documents which methods are implemented, which are not, and design rationale.Analysis types:
vignette("tte-nomenclature")now documents that swereg supports per-protocol analysis only. ITT analysis is not supported because the pipeline censors at protocol deviation. As-treated analysis requires time-varying IPW (not implemented).period_widthdocumentation:vignette("tte-nomenclature")now explains the enrollment band width / residual immortal time bias trade-off, citing Caniglia (2023) and Hernán (2016).Matching approach:
vignette("tte-nomenclature")now documents the per-band stratified matching design choice and alternatives from the literature.$s2_ipw()documentation: Clarified that IPW estimates the propensity score for baseline treatment assignment only, not time-varying treatment weights.$irr()documentation: Documented IRR ≈ HR for rare events,ns(tstop)for flexible baseline hazard,quasipoissonfor overdispersion, and computational equivalence to pooled logistic regression.IPCW stabilization: Documented the simplified marginal stabilization approach and its relationship to Danaei (2013).
swereg 26.3.20
Improvements
-
Band-based enrollment: Added explicit
isoyearweekordering before band-level collapse to prevent silent misclassification when input data is not pre-sorted by time. -
IPCW-PP: Censoring model now includes
trial_idto account for calendar-time variation in censoring patterns across enrollment bands. -
person_weeks: Now computed from actual source row counts during band collapse instead of hardcodedperiod_width. Partial-coverage bands (e.g., at data boundaries) now contribute accurate person-time.
Breaking changes
-
$irr(): Removed the constant (no time adjustment) Poisson model. Only the flexible model with natural splines (splines::ns(tstop, df=3)) is retained. Output columns renamed:IRR_flex→IRR,IRR_flex_lower→IRR_lower,IRR_flex_upper→IRR_upper,IRR_flex_pvalue→IRR_pvalue,warn_flex→warn. AllIRR_const*andwarn_constcolumns removed. -
tteenrollment_irr_combine(): Updated to match new$irr()output. Columns renamed:IRR (flexible)→IRR,95% CI (flexible)→95% CI,p (flexible)→p. Constant-model columns removed. -
TTE ID semantics: The composite person-per-trial identifier column is now called
person_trial_id(wastrial_id). The actual trial identifier (the enrollment band) is now exposed astrial_idin enrollment output. This fixes the semantics sotrial_idmeans the trial andperson_trial_ididentifies a person’s participation in a trial. -
TTEDesign default:
id_vardefault changed from"trial_id"to"person_trial_id". -
s1_impute_confounders(): No longer hardcodestrial_id; usesdesign$id_varthroughout.
Code quality
- Rename private methods
prepare_outcomeandipcw_pptos5_prepare_outcomeands6_ipcw_ppto signal their execution order withins4_prepare_for_analysis(). - Reorder
TTEEnrollmentpublic step methods to match their numeric sequence (s1 before s2).
Breaking changes
Band-based enrollment:
TTEEnrollmentenrollment now uses N-week bands (controlled byperiod_widthinTTEDesign, default 4). Calendar time is grouped into bands based onisoyearweek, matching is done per-band (stratified), and data is collapsed to band level during enrollment. This eliminates the separate$s1_collapse()step entirely.-
Step renumbering: Public workflow methods on
TTEEnrollmenthave been renumbered after removing$s1_collapse():-
$s2_impute_confounders()->$s1_impute_confounders() -
$s3_ipw()->$s2_ipw() -
$s4_truncate_weights()->$s3_truncate_weights() -
$s5_prepare_for_analysis()->$s4_prepare_for_analysis()
-
period_widthparameter: Moved fromTTEPlan$s1_generate_enrollments_and_ipw()toTTEDesign$new(period_width = 4L). Now part of the design contract.isoyearweekcolumn required: Band-based enrollment requires anisoyearweekcolumn in person-week data.Schema version bump:
TTEDesignandTTEEnrollmentschema versions bumped to 2. Objects saved with version 1 will warn on load.
New features
TTEPlan provenance timestamps: TTEPlan now tracks
created_at(stamped at construction),registry_study_created_at(from the source RegistryStudy), andskeleton_created_at(from the first skeleton file’s attribute). All three timestamps are shown inprint()andprint_spec_summary()when available, making it easy to detect stale plans.R6 schema versioning: All R6 classes (
RegistryStudy,TTEPlan,TTEDesign,TTEEnrollment) now carry a.schema_versionprivate field, stamped at construction time. A new$check_version()public method compares the stored version against the current class definition and warns when stale.qs2_read()automatically calls$check_version()on R6 objects after loading, so outdated serialized objects produce a clear warning instead of silently breaking.Deprecation warnings for old
add_*parameter names:add_diagnoses(diags=),add_operations(ops=),add_rx(rxs=),add_icdo3s(icdo3s=),add_snomed3s(snomed3s=), andadd_snomedo10s(snomedo10s=)now emit a deprecation warning when the old parameter name is used. Usecodes=instead.
Breaking changes
RegistryStudy:
register_codes()now takes a declarative signature:register_codes(codes, fn, groups, fn_args, combine_as). Each call declares codes, the function to apply them, which data groups to use, and optional prefix/combine behavior. The old per-type fields (icd10_codes,rx_atc_codes,rx_produkt_codes,operation_codes,icdo3_codes) and the oldregister_codes(icd10_codes = ...)signature are removed. The singlecode_registrylist field replaces them.summary_table(): Thetypeparameter is removed. Thetypecolumn is replaced bylabel. Uselabelto filter.add_diagnoses(),add_operations(),add_rx(),add_icdo3s(),add_snomed3s(),add_snomedo10s(): The codes parameter is renamed tocodes(wasdiags,ops,rxs,icdo3s,snomed3s,snomedo10s). Old parameter names still work for backwards compatibility.
Refactoring
- Moved
qs2_read()to its own file (R/qs2.R) and inlined the fallback logic directly. Removed pointless.qs_savewrapper (replaced with directqs2::qs_savecalls) and.qs_readinternal helper.
Breaking changes
skeleton_save()no longer splits batches into sub-files. It saves one file per batch asskeleton_NNN.qs2(wasskeleton_NNN_SS.qs2). Theids_per_fileandid_colparameters have been removed.RegistryStudy:batch_sizesparameter (integer vector) replaced withbatch_size(single integer, default 1000). Theids_per_skeleton_fileparameter has been removed. All batches are now uniform size.
swereg 26.3.21
Breaking changes
-
RENAMED: Standalone TTE functions renamed to signal which class they operate on:
-
tte_rbind()→tteenrollment_rbind() -
tte_rates_combine()→tteenrollment_rates_combine() -
tte_irr_combine()→tteenrollment_irr_combine() -
tte_impute_confounders()→tteenrollment_impute_confounders() -
tte_read_spec()→tteplan_read_spec() -
tte_apply_exclusions()→tteplan_apply_exclusions() -
tte_apply_derived_confounders()→tteplan_apply_derived_confounders() -
tte_validate_spec()→tteplan_validate_spec() -
tte_plan_from_spec_and_registrystudy()→tteplan_from_spec_and_registrystudy() -
tte_callr_pool()→callr_pool()
-
-
RENAMED: Eligibility helpers renamed from
tte_eligible_*toskeleton_eligible_*to reflect that they operate on skeleton data.tables, not TTE classes:-
tte_eligible_isoyears()→skeleton_eligible_isoyears() -
tte_eligible_age_range()→skeleton_eligible_age_range() -
tte_eligible_no_events_in_window_excluding_wk0()→skeleton_eligible_no_events_in_window_excluding_wk0() -
tte_eligible_no_observation_in_window_excluding_wk0()→skeleton_eligible_no_observation_in_window_excluding_wk0() -
tte_eligible_no_events_lifetime_before_and_after_baseline()→skeleton_eligible_no_events_lifetime_before_and_after_baseline() -
tte_eligible_combine()→skeleton_eligible_combine()
-
File reorganization
-
RENAMED:
R/tte_enrollment_r6.R→R/r6_tteenrollment.R -
RENAMED:
R/tte_plan_r6.R→R/r6_tteplan.R -
RENAMED:
R/registry_study_r6.R→R/r6_registry_study.R -
EXTRACTED:
callr_pool()to its own fileR/callr_pool.R -
MOVED: Eligibility helpers to
R/skeleton_utils.R -
MOVED:
tteenrollment_impute_confounders()toR/r6_tteenrollment.R
swereg 26.3.20
Breaking changes
-
RENAMED: TTEEnrollment public workflow methods now have step-number prefixes to signal execution order:
-
$collapse()→$s1_collapse() -
$impute_confounders()→$s2_impute_confounders() -
$ipw()→$s3_ipw() -
$truncate()→$s4_truncate_weights() -
$prepare_for_analysis()→$s5_prepare_for_analysis()
-
RENAMED:
$s4_truncate()→$s4_truncate_weights()for clarity.-
RENAMED: TTEPlan orchestration methods now have step-number prefixes:
-
$generate_enrollments_and_ipw()→$s1_generate_enrollments_and_ipw() -
$generate_analysis_files_and_ipcw_pp()→$s2_generate_analysis_files_and_ipcw_pp()
-
-
RENAMED: Internal worker functions for consistent naming:
-
.tte_process_skeleton()→.s1_worker() -
.loop2_worker()→.s2_worker()
-
REMOVED: Constructor wrapper functions
tte_design(),tte_enrollment(), andtte_plan(). UseTTEDesign$new(),TTEEnrollment$new(), andTTEPlan$new()directly. The auto-detection and data-copy logic fromtte_enrollment()has been moved intoTTEEnrollment$new().
Improvements
REFACTOR: Inlined 5 of 6 private helper methods into their single callers on TTEEnrollment (
.calculate_ipw,.calculate_ipcw,.combine_weights_fn,.match_ratio,.collapse_periods). Kept.truncate_weightsas private (used in 2 places). Reduces indirection for stateless methods that don’t useself.TESTS: Rewrote
test-tte_weights.Rto test through public API ($s1_collapse(),$s3_ipw(),$s4_truncate(),tte_enrollment(ratio=)) instead of accessing inlined private methods.
swereg 26.3.20
Improvements
REFACTOR: Inlined 6 weight/matching functions as private methods on TTEEnrollment (tte_truncate_weights, tte_calculate_ipw, tte_calculate_ipcw, tte_combine_weights, tte_match_ratio, tte_collapse_periods). Removed 2 orphaned functions (tte_identify_censoring, tte_time_to_event). Users access this functionality through R6 methods ($collapse, $ipw, $truncate, etc.).
-
REFACTOR: Consolidated TTE source files from 7 to 2 (+1 rename):
-
tte_design.R+tte_enrollment.R+tte_weights.Rmerged intotte_enrollment_r6.R(TTEDesign + TTEEnrollment + all weight/matching functions called by their methods) -
tte_plan.R+tte_spec.R+tte_eligibility.Rmerged intotte_plan_r6.R(TTEPlan + spec functions + eligibility helpers) -
registry_study.Rrenamed toregistry_study_r6.R - Files containing R6 classes now have
_r6suffix for discoverability
-
REORDER: TTEEnrollment public methods now follow workflow execution order: collapse -> ipw -> impute_confounders -> truncate -> prepare_for_analysis -> extract/summary/diagnostics -> analysis output.
DOCS: Added inline comments documenting data flow in
generate_enrollments_and_ipw()(Loop 1),.tte_process_skeleton(),private$enroll(),enrollment_spec(), andadd_one_ett().
swereg 26.3.18
Improvements
MHT: Added
rd_approach3b_{single,multiple}exposure variables that collapseestrogen_progesterone_bioidenticalandestrogen_progesterone_syntheticinto a singleestrogen_progesteronelevel. Derived by relabeling the finished approach3 columns, which is valid because switching between active MHT types never triggers “previous”.MHT:
x2026_mht_add_lmed()now creates exposure variables (rd_approach{1,2,3}_{single,multiple}) internally via the new internal helperx2026_mht_create_exposure_variables(). This consolidates all MHT LMED logic in the package, eliminating the need for a separate step 14 in external workflow scripts.MHT: Removed 18 sensitivity columns (
*_sensitivity_60p,*_sensitivity_under60censorallat60,*_sensitivity_under60censorrefat65) fromx2026_mht_create_exposure_variables(). These had a logic issue wherelocal_or_none_mhtrows at age >= 65 producedNAinstead ofFALSE. Therd_age_continuouscolumn is no longer required as input.
swereg 26.2.22
New features
EXPORTED:
tte_callr_pool()— genericcallr::r_bg()worker pool, generalized from the internal.tte_callr_pool(). New API acceptsitems(list of arg-lists),worker_fn,item_labels, andcollect(FALSE to discard results when workers save directly). Eliminates boilerplate when scripts need their own parallel loops (e.g., Loop 2 IPCW-PP).NEW:
TTEPlan$generate_analysis_files_and_ipcw_pp()— Loop 2 method that runs per-ETT IPCW-PP calculation and saves analysis-ready files. Mirrors$generate_enrollments_and_ipw()(Loop 1). Parameters:output_dir,estimate_ipcw_pp_separately_by_exposure,estimate_ipcw_pp_with_gam,n_workers,swereg_dev_path.
Improvements
MEMORY:
tte_calculate_ipcw()now usesmgcv::bam(discrete = TRUE)instead ofmgcv::gam()whenuse_gam = TRUE.bam()discretizes covariates to avoid forming the full model matrix, dramatically reducing peak memory for large datasets. Model objects are also explicitly freed (rm()+gc()) between exposed/unexposed fits.MEMORY:
$irr()and$km()now subset to only the columns needed before creatingsurvey::svydesign(). Previously the full data.table (all columns) was copied into the survey object. Model objects and intermediate data are freed between fits.
swereg 26.2.21
Breaking changes
RENAMED:
$prepare_for_analysis()parametersestimate_ipcw_separately_by_exposure→estimate_ipcw_pp_separately_by_exposureandestimate_ipcw_with_gam→estimate_ipcw_pp_with_gamfor consistency with the IPCW-PP method they control.-
PRIVATE:
$enroll(),$prepare_outcome(),$ipcw_pp(), and$combine_weights()are now private methods onTTEEnrollment.- Enrollment: use
tte_enrollment(data, design, ratio = 2, seed = 4)instead oftte_enrollment(data, design)$enroll(ratio = 2, seed = 4). - Outcome prep + IPCW: use
$prepare_for_analysis()(unchanged). - Weight combination: handled automatically by
$ipcw_pp()(unchanged). - Tests can access private methods via
enrollment$.__enclos_env__$private$method_name().
- Enrollment: use
swereg 26.2.20
Breaking changes
-
RENAMED:
$prepare_analysis()→$prepare_for_analysis()onTTEEnrollment. The new name better communicates that this method prepares the enrollment for analysis (it is not the analysis itself).
Bug fixes
FIXED: 3 remaining broken test calls (
tte_extract(),tte_summary(),tte_weights()) migrated to R6 method syntax ($extract(),print(),$combine_weights()). Column assertion updated:"weight_pp"→"analysis_weight_pp".FIXED:
$impute_confounders()now appends"impute"tosteps_completed, consistent with all other mutating methods.FIXED:
$ipcw_pp()IPW column guard moved from after IPCW computation to before it (fail-fast).
Documentation
FIXED: Vignette truncation bounds corrected from “0.5th and 99.5th percentiles” to “1st and 99th percentiles” (matching code defaults
lower = 0.01, upper = 0.99).FIXED:
TTEDesignroxygen references to removedtte_match()/tte_expand()replaced with$enroll().FIXED:
$weight_summary()moved from “Mutating” to “Non-mutating” section inTTEEnrollmentroxygen (it only prints, never modifies data).
swereg 26.2.13
New features
NEW:
$prepare_for_analysis()method onTTEEnrollmentmerges$prepare_outcome()+$ipcw_pp()into one step. Parameters:outcome,follow_up,separate_by_exposure,use_gam,censoring_var.NEW:
$enrollment_stageactive binding onTTEEnrollment. Derives lifecycle stage from existing state:"pre_enrollment"→"enrolled"→"analysis_ready". Zero maintenance — readsdata_levelandsteps_completed.
swereg 26.2.11
Breaking changes
-
REMOVED: 19 standalone TTE functions moved to R6 methods on
TTETrial(15 methods) andTTEPlan(4 methods). Pipe chaining (trial |> tte_ipw()) replaced with$-chaining (trial$ipw()).TTETrial methods:
$enroll(),$collapse(),$ipw(),$ipcw_pp(),$combine_weights(),$truncate(),$prepare_outcome(),$impute_confounders(),$weight_summary(),$extract(),$summary(),$table1(),$rates(),$irr(),$km().TTEPlan methods:
$add_one_ett(),$save(),$enrollment_spec(),$generate_enrollments_and_ipw(). -
RENAMED:
TTEPlan$task()→TTEPlan$enrollment_spec(). The method returns enrollment metadata (design, enrollment_id, age_range, n_threads), not a generic task. Theprocess_fncallback parameter convention changes fromfunction(task, file_path)tofunction(enrollment_spec, file_path).Removed exports:
tte_enroll,tte_collapse,tte_ipw,tte_ipcw_pp,tte_weights,tte_truncate,tte_prepare_outcome,tte_extract,tte_summary,tte_weight_summary,tte_table1,tte_rates,tte_irr,tte_km,tte_plan_add_one_ett,tte_plan_save,tte_plan_task,tte_generate_enrollments_and_ipw.Kept standalone:
tte_rbind(),tte_rates_combine(),tte_irr_combine(),tte_impute_confounders()(thin wrapper for callback default). CHANGED: TTE classes (
TTEDesign,TTETrial,TTEPlan) migrated from S7 to R6. Property access changes from@to$(e.g.,trial@data→trial$data,design@id_var→design$id_var). R6 reference semantics eliminate copy-on-write overhead fromtrial$data[, := ...], reducing peak RAM from ~3X to ~2X during the weight-calculation chain (Loop 2).-
FIXED: Three S7
@accessor bugs that silently produced no-ops:-
$ipcw_pp(): dropping intermediate IPCW columns (p_censor, etc.) -
$collapse(): creatingperson_weekscolumn -
$impute_confounders(): deleting old confounder columns before merge All fixed automatically by R6 (in-place modification works).
-
CHANGED:
$ipcw_pp()now inlines weight combination and truncation (was callingtte_combine_weights()andtte_truncate_weights()via function parameters that created extra refcount). Keeps data.table refcount=1 throughout.
swereg 26.2.10
Bug fixes
-
FIXED:
tte_ipw(),tte_ipcw_pp(): in-place joins via S7@accessor now use extract/modify/reassign pattern (dt <- trial@data; dt[...]; trial@data <- dt). The previoustrial@data[i, := ...]silently modified a copy, leaving the S7 object’s data unchanged.
Performance
-
IMPROVED:
tte_ipw(),tte_ipcw_pp(),tte_calculate_ipcw(): replacemerge()with in-place keyed joins (data[i, := ...]), reducing peak RAM from ~3x to ~2x panel size during the weight-calculation chain.
Breaking changes
CHANGED:
tte_ipcw_pp()now also combines weights (ipw * ipcw_pp→analysis_weight_pp), truncatesanalysis_weight_pp, and drops intermediate IPCW columns (p_censor,p_uncensored,cum_p_uncensored,marginal_p,cum_marginal). Callers no longer needtte_weights()+tte_truncate()aftertte_ipcw_pp().RENAMED:
tte_generate_enrollments()→tte_generate_enrollments_and_ipw(). Now computes IPW + truncation once on the full combined enrollment (after imputation), so the per-ETT Loop 2 no longer needs to calltte_ipw(). Newstabilizeparameter (default TRUE) controls IPW stabilization.
New features
NEW:
tte_plan_load()reads a.qs2plan file and reconstructs theTTEPlanS7 object. Companion totte_plan_save().CHANGED:
tte_plan_save()now persistsproject_prefixandskeleton_filesalongsideettandglobal_max_isoyearweek, sotte_plan_load()can fully reconstruct the object.NEW:
skeleton_process()gainsn_workersparameter for parallel batch processing. When > 1, usescallr::r()+parallel::mclapply()to process batches concurrently while avoidingfork()+ data.table OpenMP segfaults.
swereg 26.2.9
Improvements
CHANGED: Migrate serialization from
qs(archived) toqs2..qs_save/.qs_readwrappers now callqs2::qs_save/qs2::qs_read(standard format, preserves S7 objects). All file extensions changed from.qsto.qs2. Thepresetparameter is no longer used.IMPROVED:
tte_rates()now setsswereg_typeandexposure_varattributes on its output;tte_irr()setsswereg_type.RENAMED:
tte_rates_table()→tte_rates_combine(),tte_irr_table()→tte_irr_combine(). New API accepts(results, slot, descriptions)— extracts the rates/irr slot internally, removing the need forlapply(results, [[, "table2")at call sites. Exposure column is now read from theexposure_varattribute instead of guessing viasetdiff().
Breaking changes
CHANGED:
tte_plan_add_one_ett()now requires explicitenrollment_idparameter. Auto-assignment based on follow_up + age_group removed. Validation that design params match within an enrollment_id is preserved.IMPROVED:
print(plan)now shows both enrollment grid and full ETT grid.CHANGED:
tte_plan_add_one_ett()bundlesage_group,age_min,age_max,person_id_varinto anargsetnamed list parameter.time_exposure_varandeligible_varno longer have defaults (must be explicit).exposure_varremoved from interface (hardcoded to"baseline_exposed").RENAMED:
file_idcolumn in theettdata.table →enrollment_id. This makes explicit that ETTs sharing the same follow_up + age_group are processed together as one “enrollment” (shared eligibility, matching, collapse, imputation).RENAMED:
tte_generate_trials()→tte_generate_enrollments(). The function generates enrollments (one per follow_up × age_group), not individual trials.RENAMED:
tte_plan_task()return list keyfile_id→enrollment_id.UPDATED:
print(plan)now shows “Enrollments: N x M skeleton files” instead of “Tasks: N file_id(s) x M skeleton files”.
swereg 26.2.8
Breaking changes
CHANGED:
tte_plan()is now infrastructure-only — takes onlyproject_prefix,skeleton_files,global_max_isoyearweek. Usette_plan_add_one_ett()to add ETTs with per-ETT design parameters.REMOVED: TTEPlan plan-level properties
confounder_vars,person_id_var,exposure_var,time_exposure_var,eligible_var. These are now per-ETT columns in theettdata.table.REMOVED: Internal
.tte_grid()function. The ETT grid is now built incrementally viatte_plan_add_one_ett().ADDED:
TTEPlan@project_prefixproperty (needed for file naming intte_plan_add_one_ett()).
New features
NEW:
tte_plan_add_one_ett()— builder function that adds one ETT row to a plan. Stores design params (confounder_vars, person_id_var, exposure_var, time_exposure_var, eligible_var) per-ETT, allowing different ETTs to use different confounders. Validates that design params match within an enrollment_id (same follow_up + age_group).RENAMED:
TTEPlan@filesproperty →TTEPlan@skeleton_filesfor clarity.
swereg 26.2.7
Breaking changes
-
REFACTORED:
tte_generate_enrollments()(formerlytte_generate_trials()) now takes aTTEPlanobject instead of separate parameters (ett,files,confounder_vars,global_max_isoyearweek). Theprocess_fncallback signature changes fromfunction(file_path, design, file_id, age_range, n_threads)tofunction(task, file_path)wheretaskis a list withdesign,enrollment_id,age_range, andn_threads.
New features
-
NEW:
TTEPlanS7 class bundles ETT grid, skeleton file paths, confounder definitions, and design column names into a single object for trial generation.-
tte_plan(): Constructor function -
tte_plan_task(plan, i): Extract the i-th enrollment task as a list withdesign,enrollment_id,age_range,n_threads -
plan[[i]]: Shorthand fortte_plan_task(plan, i) -
length(plan): Number of unique enrollment_id groups - Supports interactive testing:
task <- plan[[1]]; process_fn(task, plan@skeleton_files[1])
-
swereg 26.2.3
Breaking changes
-
REPLACED:
tte_match()andtte_expand()merged into singlette_enroll()function:- Old workflow:
tte_trial(data, design) |> tte_match(ratio = 2, seed = 4) |> tte_expand(extra_cols = "isoyearweek") - New workflow:
tte_trial(data, design) |> tte_enroll(ratio = 2, seed = 4, extra_cols = "isoyearweek") - The two operations were tightly coupled and always used together
-
tte_enroll()combines sampling (matching) and panel expansion in one step - Records “enroll” in
steps_completed(previously recorded “match” then “expand”)
- Old workflow:
New features
-
NEW: Trial eligibility helper functions for composable eligibility criteria:
-
tte_eligible_isoyears(): Check eligibility based on calendar years -
tte_eligible_age_range(): Check eligibility based on age range -
tte_eligible_no_events_in_window_excluding_wk0(): Check for no events in prior window (correctly excludes baseline week) -
tte_eligible_no_observation_in_window_excluding_wk0(): Check for no specific value in prior window (for categorical variables) -
tte_eligible_combine(): Combine multiple eligibility columns using AND logic - All functions modify data.tables by reference and return invisibly for method chaining
-
Documentation
-
IMPROVED: Clarified that eligibility checks should EXCLUDE the baseline week. Using
cumsum(x) == 0is incorrect because it includes the current week. The new eligibility functions useany_events_prior_to()which correctly excludes the current row.
swereg 26.1.31
New features
-
NEW: S7 object-oriented API for target trial emulation workflows:
-
TTEDesignclass: Define column name mappings once and reuse across all TTE functions -
TTETrialclass: Fluent method chaining with workflow state tracking -
tte_design()/tte_trial(): Constructor functions for the S7 classes -
tte_match(),tte_expand(),tte_collapse(),tte_ipw(): S7 methods for data preparation -
tte_prepare_outcome(),tte_ipcw(): Outcome-specific per-protocol analysis -
tte_weights(),tte_truncate(): Weight combination and truncation -
tte_rbind(): Combine batched trial objects -
tte_extract(),tte_summary(): Access data and diagnostics -
tte_table1(),tte_rates(),tte_irr(),tte_km(): Analysis and visualization
-
Breaking changes
-
REMOVED: Deprecated S7 methods replaced by
tte_prepare_outcome():-
tte_tte(): Usette_prepare_outcome()which computesweeks_to_eventinternally -
tte_set_outcome(): Usette_prepare_outcome(outcome = "...")instead -
tte_censoring(): Usette_prepare_outcome()which handles censoring internally
-
swereg 26.1.30
New features
-
NEW: Target trial emulation weight functions for causal inference in observational studies:
-
tte_calculate_ipw(): Calculate stabilized inverse probability of treatment weights (IPW) for baseline confounding adjustment using propensity scores -
tte_calculate_ipcw(): Calculate time-varying inverse probability of censoring weights (IPCW) for per-protocol analysis using GAM or GLM -
tte_identify_censoring(): Identify protocol deviation and loss to follow-up for per-protocol analysis -
tte_combine_weights(): Combine IPW and IPCW weights for per-protocol effect estimation -
tte_truncate_weights(): Truncate extreme weights at specified quantiles to reduce variance
-
-
NEW: Target trial emulation data preparation functions:
-
tte_match_ratio(): Sample comparison group at specified ratio (e.g., 2:1 unexposed to exposed) -
tte_collapse_periods(): Collapse fine-grained time intervals (e.g., weekly) to coarser periods (e.g., 4-week) -
tte_time_to_event(): Calculate time to first event for each trial/person
-
swereg 25.12.24
API changes
-
SIMPLIFIED: Removed
validate_source_column()requirement fromadd_diagnoses(),add_operations(),add_icdo3s(),add_snomed3s(), andadd_snomedo10s():- The
sourcecolumn is no longer required in diagnosis data - To track diagnoses by source (inpatient/outpatient/cancer), filter the dataset externally before calling
add_diagnoses() - See
?add_diagnosesfor the recommended pattern
- The
New features
-
NEW:
any_events_prior_to()function for survival analysis:- Checks if any TRUE values exist in a preceding time window (excludes current row)
- Useful for determining if an event occurred in a prior time period
- Default window of 104 weeks (~2 years) with customizable size
- Complements
steps_to_first()for comprehensive time-to-event analysis
-
ENHANCED:
steps_to_first()function improvements:- Renamed parameter from
windowtowindow_including_wk0for clarity - Default window is now 104 (inclusive of current week)
- Added
@family survival_analysistag and cross-reference toany_events_prior_to()
- Renamed parameter from
Bug fixes
- FIXED: Added slider package to Imports in DESCRIPTION to fix R CMD check warning about undeclared import
Data
-
BREAKING: Replaced separate
fake_inpatient_diagnosesandfake_outpatient_diagnoseswith unifiedfake_diagnosesdataset:- New
SOURCEcolumn identifies data origin: “inpatient”, “outpatient”, or “cancer” - ~2000 inpatient records, ~2000 outpatient records, ~1000 cancer records
- Cancer records always have populated
ICDO3codes - Enables testing of source-based filtering and validation
- New
-
ENHANCED: Added ICD-O-3 and SNOMED-CT columns to fake diagnosis data:
-
ICDO3: ICD-O-3 morphology codes (always populated for cancer source) -
SNOMED3: SNOMED-CT version 3 codes -
SNOMEDO10: SNOMED-CT version 10 codes
-
Validation
- ENHANCED: SOURCE column validation is now optional - filter externally if needed (see API changes above)
Documentation
-
IMPROVED: Examples for
add_icdo3s(),add_snomed3s(), andadd_snomedo10s()are now runnable using package fake data (previously wrapped in\dontrun{})
swereg 25.12.6
New features
-
NEW:
steps_to_first()function for survival analysis:- Calculates the number of steps (e.g., weeks) until the first TRUE value in a forward-looking window
- Useful for time-to-event calculations in longitudinal registry data
- Default window of 103 weeks (~2 years) with customizable size
- Returns NA if no event occurs within the window
Bug fixes
-
CRITICAL: Fixed
add_snomed3s()andadd_snomedo10s()calling non-existent internal functions- Both functions now correctly call
add_diagnoses_or_operations_or_cods_or_icdo3_or_snomed() - These functions would have caused runtime errors before this fix
- Both functions now correctly call
-
FIXED: Removed erroneous
icdo10column references fromadd_diagnoses():- ICD-O only has editions 1, 2, and 3 (not 10)
- ICD-O-3 codes should be handled via the dedicated
add_icdo3s()function
-
FIXED: Added
icd7*andicd9*columns to diagnosis search inadd_diagnoses():- Historical ICD-7 and ICD-9 columns are now properly searched when
diag_type = "both" - Validation and helper function now consistent
- Historical ICD-7 and ICD-9 columns are now properly searched when
-
FIXED: Corrected error messages in
add_icdo3s(),add_snomed3s(), andadd_snomedo10s():- Messages now correctly reference the appropriate data types instead of “operation data”
Documentation
-
ENHANCED:
add_diagnoses()documentation now clearly lists which diagnosis columns are searched:- When
diag_type = "both":hdia,dia*,ekod*,icd7*,icd9* - When
diag_type = "main":hdiaonly
- When
swereg 25.8.19
CRAN Submission Preparation
-
CRAN READY: Package prepared for CRAN submission with comprehensive compliance improvements:
- Fixed DESCRIPTION file author field duplication issue
- Updated .Rbuildignore to exclude all development files (docs/, .git/, .Rhistory, etc.)
- Removed non-portable files (@eaDir directories, .DS_Store files)
- Added missing global variable declarations to prevent R CMD check warnings
- Verified URL consistency between DESCRIPTION and package startup messages
-
OPTIMIZED: Vignette structure significantly improved for CRAN submission:
- Reduced total vignette content by 31% (626 lines removed)
- Condensed cookbook-survival-analysis.Rmd (removed verbose descriptive statistics and redundant sections)
- Simplified skeleton2-clean.Rmd (removed duplicated skeleton1_create workflow)
- Streamlined skeleton3-analyze.Rmd (removed redundant data loading and best practices sections)
- Fixed all vignette build errors by ensuring consistent data variable availability
- All vignettes now compile successfully and use package synthetic data consistently
- VALIDATED: All examples are runnable using package fake data - no \dontrun sections without justification
Code Quality Improvements
-
CONSISTENCY: Fixed date_columns parameter usage throughout package:
- Updated all vignettes to use lowercase date_columns parameters (e.g., “indatum” instead of “INDATUM”)
- Added warning to make_lowercase_names() function when uppercase date_columns are provided
- Enhanced documentation to clarify that date_columns should use lowercase names
- Improved user experience with clear guidance and automatic handling of uppercase inputs
-
ELEGANCE: Enhanced vignette code patterns for better readability:
- Replaced verbose data() loading patterns with elegant pipe syntax
- Updated all data loading to use swereg::fake_* |> copy() |> make_lowercase_names() pattern
- Eliminated clumsy multi-step data preparation code throughout vignettes
- Improved code flow and professional appearance of package examples
- VERIFIED: Package builds successfully with R CMD build and passes CRAN compliance checks
- CONFIRMED: inst/ directory contains only files referenced by package functions
swereg 25.7.30
New Features
-
NEW:
make_rowind_first_occurrence()helper function for rowdep → rowind transformations:- Simplifies the common pattern of creating row-independent variables from first occurrence of conditions
- Automatically handles temp variable creation and cleanup
- Uses
first_non_na()for robust aggregation across all variable types - Includes comprehensive input validation and clear error messages
-
NEW: “Understanding rowdep and rowind Variables” vignette:
- Explains the fundamental distinction between row-dependent and row-independent variables
- Demonstrates common transformation patterns with practical examples
- Shows integration with the swereg workflow (skeleton1_create → skeleton2_clean → skeleton3_analyze)
- Includes best practices for longitudinal registry data analysis
swereg 25.7.16
New Swedish Date Parsing and Enhanced Data Cleaning
-
NEW:
parse_swedish_date()function for handling Swedish registry dates with varying precision:- Handles 4-character (YYYY), 6-character (YYYYMM), and 8-character (YYYYMMDD) formats
- Automatically replaces “0000” with “0701” and “00” with “15” for missing date components
- Supports custom defaults for missing date parts
- Includes comprehensive error handling and vectorized processing
-
ENHANCED:
make_lowercase_names()now supports automatic date cleaning:- New
date_columnparameter to specify which column contains dates - Automatically creates cleaned ‘date’ column using
parse_swedish_date() - Works with both default and data.table methods
- Maintains backward compatibility with existing code
- New
-
ENHANCED: All
add_*functions now require cleaned date columns:-
add_diagnoses(),add_operations(),add_rx(),add_cods()expect ‘date’ column - Clear error messages guide users to use
make_lowercase_names(data, date_column = "...") - Improved validation ensures data preprocessing consistency
-
-
ENHANCED:
create_skeleton()now includespersonyearscolumn:- Annual rows (is_isoyear==TRUE) have personyears = 1
- Weekly rows (is_isoyear==FALSE) have personyears = 1/52.25
- Facilitates person-time calculations for survival analysis
-
IMPROVED: Survival analysis cookbook vignette updated:
- Uses weekly data instead of yearly data for more precise analyses
- Age calculation based on isoyearweeksun instead of isoyear
- Includes person-time in descriptive statistics
- Demonstrates proper use of new date cleaning workflow
Enhanced error handling and validation
-
ENHANCED: Comprehensive input validation for all
add_*functions:-
add_onetime(): Validates skeleton structure, ID column exists, checks for ID matches -
add_annual(): Validates isoyear parameter, checks skeleton year coverage -
add_diagnoses(): Validates diagnosis patterns, checks for diagnosis code columns -
add_operations(): Validates operation patterns, checks for operation code columns -
add_rx(): Validates prescription data structure, checks source columns -
add_cods(): Validates death data structure, checks cause of death columns
-
-
IMPROVED: User-friendly error messages with specific guidance:
- Clear indication when
make_lowercase_names()is forgotten - Helpful suggestions for column naming issues
- Informative ID mismatch diagnostics with sample values
- Clear indication when
- NEW: Internal validation helper functions for consistent error handling
- ADDED: Input validation for pattern lists, data structures, and parameter ranges
New cookbook documentation
-
NEW: Comprehensive survival analysis cookbook (
cookbook-survival-analysis.Rmd):- Complete workflow from raw data to Cox proportional hazards model
- Time-varying covariates (annual income) with heart attack outcome
- Handles common challenges: missing data, multiple events, competing risks
- Performance tips for large datasets
- Practical solutions for real-world registry analysis
-
ENHANCED: Updated
_pkgdown.ymlwith new “Cookbooks” section -
ADDED:
survivalpackage to Suggests dependencies
swereg 25.7.16
Major documentation restructuring
-
RESTRUCTURED: Complete vignette reorganization for clear learning progression:
- NEW “Skeleton concept” vignette: Conceptual foundation explaining the skeleton approach without technical implementation
- “Building the data skeleton (skeleton1_create)”: Pure data integration focus - raw data to time-structured skeleton
- “Cleaning and deriving variables (skeleton2_clean)”: Pure data cleaning and variable derivation focus
- “Production analysis workflows (skeleton3_analyze)”: Memory-efficient processing and final analysis datasets
- IMPROVED: Clear separation of concerns with focused, single-purpose tutorials
- ENHANCED: Systematic learning progression from concept to implementation to production
- UPDATED: _pkgdown.yml structure with logical vignette grouping
- PRESERVED: All existing technical content while improving organization
Content improvements
- NEW: Comprehensive conceptual introduction based on presentation content
- IMPROVED: Each vignette builds systematically on the previous one
- ENHANCED: Better explanation of three types of data integration (one-time, annual, event-based)
- CLARIFIED: Production workflow patterns with memory-efficient batching strategies
- STANDARDIZED: Consistent academic tone and sentence case throughout
swereg 25.7.15
Documentation and presentation improvements
-
STANDARDIZED: Changed all titles and headings to normal sentence case throughout:
- Vignette titles: “Basic Workflow” → “Basic workflow”, “Complete Workflow” → “Complete workflow”, etc.
- README.md section headings: “Core Functions” → “Core functions”, “Data Integration” → “Data integration”, etc.
- NEWS.md section headings: “Vignette Restructuring” → “Vignette restructuring”, etc.
- CLAUDE.md section headings: “Project Overview” → “Project overview”, “Development Commands” → “Development commands”, etc.
- IMPROVED: Consistent normal sentence case for better readability and less formal appearance
- SIMPLIFIED: Removed subtitle text after colons in vignette titles for cleaner presentation
-
ENHANCED: Improved Core Concept section in basic workflow vignette with clear explanation of three data types:
- One-time data (demographics): Added to all rows for each person
- Annual data (income, family status): Added to all rows for specific year
- Event-based data (diagnoses, prescriptions, deaths): Added to rows where events occurred
-
CLARIFIED: Step 1 documentation now properly explains all skeleton columns including
isoyearweeksun - VERIFIED: All vignettes compile successfully with improved content
Major documentation and vignette reorganization
-
RESTRUCTURED: Complete vignette reorganization with improved naming and content flow:
-
swereg.Rmd→basic-workflow.Rmd: Focused introduction to skeleton1_create -
advanced-workflow.Rmd→complete-workflow.Rmd: Two-stage workflow (skeleton1_create + skeleton2_clean) -
memory-efficient-batching.Rmd: Maintained as comprehensive three-stage workflow guide
-
- IMPROVED: Eliminated content redundancy between vignettes for clearer learning progression
- ENHANCED: Updated _pkgdown.yml configuration to reflect new vignette structure
Function documentation improvements
-
ENHANCED: Comprehensive documentation improvements for all exported functions:
- Added @family tags for logical grouping (data_integration, skeleton_creation, data_preprocessing)
- Added @seealso sections with cross-references to related functions and vignettes
- Replaced placeholder examples with runnable code using synthetic data
- Improved parameter documentation with detailed descriptions and expected formats
- Enhanced return value documentation with explicit side effects description
- STANDARDIZED: Consistent academic tone throughout all documentation
Professional presentation updates
- IMPROVED: Removed informal elements and adopted academic tone across all documentation
- UPDATED: Changed terminology from “fake data” to “synthetic data” throughout
- ENHANCED: More professional language in README.md and vignettes
- STANDARDIZED: Consistent formal tone appropriate for scientific software
swereg 25.7.1
Vignette restructuring
-
RESTRUCTURED: Reorganized vignettes for clearer learning progression:
-
swereg.Rmd: Clean skeleton1_create tutorial using full datasets (removed subset filtering) -
advanced-workflow.Rmd: Focused skeleton1→skeleton2 workflow (removed batching and skeleton3 content) -
memory-efficient-batching.Rmd: NEW comprehensive batching vignette with complete skeleton1→skeleton2→skeleton3 workflow for large-scale studies
-
- IMPROVED: GitHub Actions workflow optimization with dependency caching and binary packages for faster CI/CD
Batching vignette fixes
-
FIXED: Updated memory-efficient-batching vignette with production-ready improvements:
- Replace
split()withcsutil::easy_splitfor better batch handling - Replace
saveRDS/readRDSwithqs::qsave/qreadfor 2-10x faster file I/O - Fix skeleton3_analyze to properly aggregate weekly→yearly data using
swereg::max_with_infinite_as_na - Remove incorrect
is_isoyear == TRUEfilter in skeleton3_analyze - Fix analysis results to avoid NaN outputs in treatment rate calculations
- Add explanations for weekly→yearly data aggregation and qs package performance benefits
- Replace
New features
-
NEW: Added
isoyearweeksunvariable tocreate_skeleton()function - provides Date representing the Sunday (last day) of each ISO week/year for easier date calculations - NEW: Updated package logo
-
IMPROVED: Updated all vignettes to not assume swereg is loaded - all functions use
swereg::prefix anddata()calls usepackage="swereg"argument -
IMPROVED: Updated function documentation to clarify that pattern matching functions (
add_diagnoses,add_cods,add_rx) automatically add “^” prefix - users should NOT include “^” in their patterns -
NEW: Added comprehensive fake Swedish registry datasets for development and vignettes:
-
fake_person_ids: 1000 synthetic personal identifiers -
fake_demographics: Demographics data matching SCB format -
fake_annual_family: Annual family status data -
fake_inpatient_diagnosesandfake_outpatient_diagnoses: NPR diagnosis data with ICD-10 codes -
fake_prescriptions: LMED prescription data with ATC codes and hormone therapy focus -
fake_cod: Cause of death data
-
-
NEW: Added two comprehensive vignettes:
-
swereg.Rmd: Basic skeleton1_create workflow tutorial -
advanced-workflow.Rmd: Complete 3-phase workflow (skeleton1 → skeleton2 → skeleton3)
-
- NEW: Replaced magrittr pipe (%>%) with base pipe (|>) throughout codebase
- NEW: Added memory-efficient batched processing examples for large registry studies
Bug fixes
-
CRITICAL: Fixed incorrect variable names in
fake_coddataset - changed from non-Swedishunderlying_cod/contributory_cod1/contributory_cod2to correct Swedish registry namesulorsak/morsak1/morsak2 - VERIFIED: Confirmed all fake datasets use correct Swedish registry variable name conventions
- VERIFIED: All ICD-10 and ATC codes in fake datasets are properly formatted and realistic
Documentation improvements
- BREAKING: Fixed incorrect function descriptions that were copied from another package
-
NEW: Added comprehensive roxygen2 documentation for all exported functions:
-
add_onetime(): Documents merging one-time/baseline data to skeleton -
add_annual(): Documents merging annual data for specific ISO years -
add_cods(): Documents cause of death analysis with ICD-10 codes -
add_diagnoses(): Documents diagnosis analysis with main/secondary diagnoses -
add_operations(): Documents surgical operation analysis including gender-affirming procedures -
add_rx(): Documents prescription drug analysis with ATC/product codes -
create_skeleton(): Documents longitudinal skeleton creation with detailed return structure -
make_lowercase_names(): Documents generic function with S3 methods -
x2023_mht_add_lmed(): Documents specialized MHT study function
-
- NEW: Added documentation for all helper functions:
-
NEW: Added
@paramdescriptions for all function parameters -
NEW: Added
@returndescriptions explaining function outputs -
NEW: Added
@exampleswith practical usage demonstrations -
NEW: Added
@detailsand@notesections for complex functions -
IMPROVED: Used proper roxygen2 practices including
@rdnamefor S3 methods and@seealsocross-references
