
Transform a row-dependent variable to a row-independent variable using first occurrence
Source:R/data_transformations.R
make_rowind_first_occurrence.RdCreates a row-independent (`ri_`) variable by finding the first occurrence where a condition is TRUE and extracting the corresponding value. This is a common pattern in longitudinal registry data analysis for creating stable person-level characteristics from time-varying skeleton columns.
Details
swereg distinguishes two variable shapes in longitudinal skeleton data:
- **row-dependent** (prefix `rd_`)
Values that can change over time for a person. Examples: `rd_age_continuous`, `rd_education`, `rd_income_inflation_adjusted`.
- **row-independent** (prefix `ri_`)
Values that are fixed person-level. Examples: `ri_birthcountry`, `ri_age_first_diagnosis`, `ri_isoyear_first_diagnosis`, `ri_register_tag`.
This function automates the common `rd_` -> `ri_` transformation of capturing "the value at the first time something became true". The transformation follows these steps: 1. Create a temporary column where `condition` is TRUE 2. Use `first_non_na()` to find the first occurrence for each person 3. Clean up the temporary column automatically
Equivalent to the manual pattern:
dt[condition, temp := value_var]
dt[, new_var := first_non_na(temp), by = .(id)]
dt[, temp := NULL]
See also
first_non_na for the aggregation function used internally
Examples
if (FALSE) { # \dontrun{
# Create example skeleton with diagnosis data
skeleton <- create_skeleton(c(1,2,3), "2020-01-01", "2020-12-31")
# Add some example diagnosis data
add_diagnoses(skeleton, diagnosis_data, "lopnr",
codes = list("example_diag" = "^F64"))
# Transform: age at first example diagnosis
make_rowind_first_occurrence(skeleton,
condition = "example_diag == TRUE",
value_var = "age",
new_var = "ri_age_first_example_diag")
} # }