
Variable types: rowdep vs rowind
Source:vignettes/rowdep-rowind-concept.Rmd
rowdep-rowind-concept.Rmd
library(swereg)
#> swereg 26.4.2
#> https://papadopoulos-lab.github.io/swereg/
library(data.table)Introduction
In longitudinal registry data analysis with swereg, variables can be classified into two fundamental types:
- rowdep (row-dependent): Variables that can change over time for a person
- rowind (row-independent): Variables that cannot change over time for a person
Understanding this distinction is crucial for effective analysis of
Swedish registry data, particularly during the
skeleton2_clean phase where many transformations convert
rowdep variables into rowind variables.
The concept
Row-dependent (rowdep) variables
These variables can have different values across time periods for the same person:
-
Education level (
rowdep_edu_cat): A person’s education can improve over time -
Income
(
rowdep_income_inflation_adjusted): Income changes annually -
Had diagnosis this week
(
diag_gd_icd10_F64_089): TRUE/FALSE depending on the specific week -
Current age (
age): Increases continuously over time
Row-independent (rowind) variables
These variables have the same value across all time periods for a person:
-
Age at first diagnosis
(
rowind_age_first_gd): Fixed once the diagnosis occurs -
Year of first diagnosis
(
rowind_isoyear_first_gd): Fixed historical fact -
Birth country (
rowind_birthcountry): Never changes -
Register classification
(
rowind_register_tag): Person’s role in the study (case, control, etc.) -
Age at death (
rowind_age_death): Fixed once death occurs
Why this matters
In longitudinal data, you often need to:
- Identify when something first happened (e.g., first diagnosis, first prescription)
- Capture characteristics at specific time points (e.g., education level when first diagnosed)
- Create person-level summaries that don’t vary by time period
Converting rowdep → rowind allows you to
create stable, person-level characteristics for analysis.
Common transformation patterns
Pattern 1: First occurrence transformations
The most common transformation finds the first time a condition is TRUE and extracts a value from that time point. This works for any condition - time-based, value-based, or complex combinations.
Time-based conditions
# Age at first GD diagnosis for AFAB individuals
make_rowind_first_occurrence(skeleton_gd,
condition = "diag_gd_icd10_F64_089 == TRUE & is_amab == FALSE",
value_var = "age",
new_var = "rowind_age_first_gd_afab")
# Education level at time of first diagnosis
make_rowind_first_occurrence(skeleton_gd,
condition = "isoyearweek == rowind_isoyearweek_first_gd",
value_var = "rowdep_edu_cat",
new_var = "rowind_edu_at_first_dx")Manual approach (for understanding)
# The helper function does this automatically:
skeleton_gd[diag_gd_icd10_F64_089 == TRUE & is_amab == FALSE, temp := age]
skeleton_gd[, rowind_age_first_gd_afab := swereg::first_non_na(temp), by = .(id)]
skeleton_gd[, temp := NULL] # Always clean up temp variablePattern 2: Simple renaming (already rowind)
When variables are already row-independent but need consistent naming:
# If date of birth is truly the same for all rows (as it should be)
setnames(skeleton_gd, "dob", "rowind_dob")
# If birth country was added from demographics (already person-level)
setnames(skeleton_gd, "birth_country", "rowind_birth_country")Important: Only use this when the variable is genuinely the same across all rows for each person. If values differ, you have a data integration problem to fix first.
Practical example with fake data
Let’s demonstrate these concepts using the fake datasets included in swereg:
# Create a small skeleton for demonstration
ids <- swereg::fake_demographics$lopnr[1:5]
skeleton <- create_skeleton(ids, "2020-01-01", "2020-03-31")
# Add demographic data (creates rowind variables)
fake_demographics <- swereg::fake_demographics |>
data.table::copy() |>
swereg::make_lowercase_names(date_columns = "doddatum")
#> Found additional date columns not in date_columns: fodelseman. Consider adding them for automatic date parsing.
add_onetime(skeleton, fake_demographics, "lopnr")
# Add diagnosis data (creates rowdep variables)
fake_diagnoses <- swereg::fake_diagnoses |>
data.table::copy() |>
swereg::make_lowercase_names(date_columns = "indatum")
#> Found additional date columns not in date_columns: utdatum. Consider adding them for automatic date parsing.
add_diagnoses(skeleton, fake_diagnoses, "lopnr",
diags = list("f64_diag" = "^F64"))
#> Warning: 'diags' is deprecated, use 'codes' instead.
# Examine the structure
print("Skeleton structure:")
#> [1] "Skeleton structure:"
head(skeleton[id == ids[1]], 8)
#> id isoyear isoyearweek is_isoyear isoyearweeksun personyears fodelseman
#> <int> <int> <char> <lgcl> <Date> <num> <char>
#> 1: 1 1900 1900-** TRUE 1900-07-01 1 1959
#> 2: 1 1901 1901-** TRUE 1901-06-30 1 1959
#> 3: 1 1902 1902-** TRUE 1902-06-29 1 1959
#> 4: 1 1903 1903-** TRUE 1903-06-28 1 1959
#> 5: 1 1904 1904-** TRUE 1904-07-03 1 1959
#> 6: 1 1905 1905-** TRUE 1905-07-02 1 1959
#> 7: 1 1906 1906-** TRUE 1906-07-01 1 1959
#> 8: 1 1907 1907-** TRUE 1907-06-30 1 1959
#> doddatum f64_diag
#> <Date> <lgcl>
#> 1: <NA> FALSE
#> 2: <NA> FALSE
#> 3: <NA> FALSE
#> 4: <NA> FALSE
#> 5: <NA> FALSE
#> 6: <NA> FALSE
#> 7: <NA> FALSE
#> 8: <NA> FALSENow let’s create rowind variables:
# Example 1: ISO year of first F64 diagnosis
make_rowind_first_occurrence(skeleton,
condition = "f64_diag == TRUE",
value_var = "isoyear",
new_var = "rowind_isoyear_first_f64")
# Example 2: ISO year-week of first F64 diagnosis
make_rowind_first_occurrence(skeleton,
condition = "f64_diag == TRUE",
value_var = "isoyearweek",
new_var = "rowind_isoyearweek_first_f64")
# View results
print("Results with rowind variables:")
#> [1] "Results with rowind variables:"
head(skeleton[id == ids[1], .(id, isoyear, isoyearweek, f64_diag,
rowind_isoyear_first_f64, rowind_isoyearweek_first_f64)], 8)
#> id isoyear isoyearweek f64_diag rowind_isoyear_first_f64
#> <int> <int> <char> <lgcl> <int>
#> 1: 1 1900 1900-** FALSE NA
#> 2: 1 1901 1901-** FALSE NA
#> 3: 1 1902 1902-** FALSE NA
#> 4: 1 1903 1903-** FALSE NA
#> 5: 1 1904 1904-** FALSE NA
#> 6: 1 1905 1905-** FALSE NA
#> 7: 1 1906 1906-** FALSE NA
#> 8: 1 1907 1907-** FALSE NA
#> rowind_isoyearweek_first_f64
#> <char>
#> 1: <NA>
#> 2: <NA>
#> 3: <NA>
#> 4: <NA>
#> 5: <NA>
#> 6: <NA>
#> 7: <NA>
#> 8: <NA>Notice how: - f64_diag varies by week (rowdep) -
rowind_isoyear_first_f64 and
rowind_isoyearweek_first_f64 are constant for each person
(rowind)
Best practices
1. Naming conventions
Use clear prefixes to distinguish variable types: -
rowdep_* for time-varying variables - rowind_*
for time-invariant variables
2. Always clean up temporary variables
When using manual transformations, always remove temp variables:
skeleton[condition == TRUE, temp := value]
skeleton[, new_rowind_var := first_non_na(temp), by = .(id)]
skeleton[, temp := NULL] # Critical: clean up3. Use helper functions when possible
The make_rowind_first_occurrence() function handles temp
variable management automatically and reduces errors.
4. Validate your transformations
Always check that your rowind variables are actually row-independent:
# This should return 1 for all persons (meaning all rows have same value)
skeleton[, .(unique_values = uniqueN(rowind_isoyear_first_f64)), by = .(id)]Integration with swereg workflow
The rowdep/rowind concept fits into the standard swereg workflow:
- skeleton1_create: Focus on data integration, creates mostly rowdep variables
-
skeleton2_clean: Heavy focus on rowdep → rowind
transformations
- skeleton3_analyze: Work with clean rowind variables for analysis
The skeleton2_clean phase is where most rowdep → rowind
transformations occur, as you prepare stable person-level
characteristics for downstream analysis.
Conclusion
Understanding the distinction between rowdep and rowind variables is
fundamental to effective longitudinal registry data analysis. The
transformation patterns shown here, especially when combined with helper
functions like make_rowind_first_occurrence(), make it
easier to create robust, analysis-ready datasets from complex Swedish
registry data.
For more advanced workflows, see the other swereg vignettes on skeleton creation, data integration, and memory-efficient processing.