
RegistryStudy: Unified R6 class for skeleton pipeline
Source:R/r6_registry_study.R
RegistryStudy.RdRegistryStudy: Unified R6 class for skeleton pipeline
RegistryStudy: Unified R6 class for skeleton pipeline
Portable Directory Resolution
Directories are stored as candidate path vectors and resolved lazily via active bindings. The first existing directory wins and is cached. If the cached path becomes invalid (e.g. after moving to a different machine), the binding automatically re-resolves from the candidate list.
Code Registry
Code registrations are declarative. Each `register_codes()` call specifies codes, the function to apply them (e.g. `add_diagnoses`), which data groups to use, and optional prefixing/combining. This replaces the old system of separate fields per code type.
Public fields
group_namesCharacter vector. Names of rawbatch groups.
batch_sizeInteger. Number of IDs per batch.
seedInteger. Shuffle seed for reproducibility.
id_colCharacter. Person ID column name.
n_idsInteger. Total number of IDs across all batches.
n_batchesInteger. Number of batches.
batch_id_listList of ID vectors, one per batch.
groups_savedCharacter vector of rawbatch groups saved to disk.
code_registryList of code registration entries. Each entry is a list with: codes, fn, fn_args, groups, combine_as, label.
created_atPOSIXct. Timestamp when this study was created.
Active bindings
data_generic_dirCharacter (read-only). Resolved path for rawbatch and (by default) skeleton files. Lazily resolved from candidates.
skeleton_dirCharacter (read-only). Resolved path for skeleton output.
data_raw_dirCharacter or NULL (read-only). Resolved path for raw registry files. NULL if not configured.
skeleton_filesCharacter vector (read-only). Skeleton output file paths detected on disk. Scans `skeleton_dir` on each access.
expected_skeleton_file_countInteger (read-only). Expected number of skeleton files (one per batch).
meta_fileCharacter. Path to the metadata file.
Methods
Method new()
Create a new RegistryStudy object.
Usage
RegistryStudy$new(
data_generic_dir,
group_names = c("lmed", "inpatient", "outpatient", "cancer", "dors", "other"),
skeleton_dir = data_generic_dir,
data_raw_dir = NULL,
batch_size = 1000L,
seed = 4L,
id_col = "lopnr"
)Arguments
data_generic_dirCharacter vector of candidate paths for rawbatch and (by default) skeleton files. Resolved lazily.
group_namesCharacter vector of rawbatch group names.
skeleton_dirCharacter vector of candidate paths for skeleton output. Defaults to same candidates as `data_generic_dir`.
data_raw_dirCharacter vector of candidate paths for raw registry files (optional). NULL if raw data paths are managed externally.
batch_sizeInteger. Number of IDs per batch. Default: 1000L.
seedInteger. Shuffle seed.
id_colCharacter. Person ID column name.
Method check_version()
Check if this object's schema version matches the current class version. Warns if the object was saved with an older schema version.
Method register_codes()
Register code definitions for the code registry.
Each call declares codes, the function to apply them, which batch data groups to use, and optional prefixing/combining. Appends to `self$code_registry`.
Usage
RegistryStudy$register_codes(
codes,
fn,
groups,
fn_args = list(),
combine_as = NULL,
label = NULL
)Arguments
codesNamed list of code vectors (e.g. ICD-10, ATC, operation codes).
fnFunction to call (e.g. `add_diagnoses`, `add_rx`).
groupsNamed list mapping prefixes to group names. Unnamed elements get no prefix. Each element is a character vector of group names to rbindlist before calling `fn`.
fn_argsNamed list of extra arguments to pass to `fn` (e.g. `list(source = "atc")`).
combine_asCharacter or NULL. If non-NULL, also run `fn` on all groups combined, using this as the prefix.
labelCharacter. Human-readable label for describe_codes() output. Defaults to deparse(substitute(fn)).
Method process_skeletons()
Process batches through a user-defined function.
Examples
if (FALSE) { # \dontrun{
study <- RegistryStudy$new(
data_generic_dir = c("/linux/path/generic/", "C:/windows/path/generic/"),
data_raw_dir = c("/linux/path/raw/", "C:/windows/path/raw/"),
group_names = c("lmed", "inpatient", "outpatient", "cancer", "dors", "other")
)
study$register_codes(
codes = list("stroke_any" = c("I60", "I61", "I63")),
fn = add_diagnoses,
groups = list(ov = "outpatient", sv = "inpatient", dors = "dors", can = "cancer"),
combine_as = "osdc"
)
study$register_codes(
codes = list("rx_n05a" = c("N05A")),
fn = add_rx,
fn_args = list(source = "atc"),
groups = list("lmed")
)
study$set_ids(ids)
study$save_rawbatch("lmed", lmed_data)
study$describe_codes()
result <- study$process_skeletons(my_fn, n_workers = 4L)
} # }