📚 Codebooks • rcahelpr

Codebooks are often to provide a comprehensive guide to the variables and coding schemes in a data set, ensuring consistent and accurate interpretation of data. They serve as a reference tool to facilitate understanding and analysis of complex data structures by researchers and analysts.

To support this crucial functionality, we include a codebook function into this package:

# Load the library
library(rcahelpr)

## {rcahelpr} loaded! Happy nerding.

# Create a test data set
test <- data.frame(
  "person" = c("chris", "maeve", "joseph", "brooks"),
  "org" = c("csg", "wdoc", "ccjbh", "asu"),
  "years_in_org" = c(1, 0.3, 0.2, NA),
  "role" = as.factor(c("mentor", "rca", "rca", "mentor")),
  "date" = as.Date(c("2020-01-01", "2020-01-01", NA, "2020-01-02"))
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE)

Variable Name	Data Class	Valid Values	Statistics	Unique Values	Missing Values
person	Character	Unique strings (n=4): chris, maeve, joseph, and more.	4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1)	4	0 (0%)
org	Character	Unique strings (n=4): csg, wdoc, ccjbh, and more.	4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1)	4	0 (0%)
years_in_org	Numeric	Numeric range from 0.2 to 1.	Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44	4	1 (25%)
role	Factor	Categorical variable with 2 levels: mentor, rca	2 Unique factors: mentor, rca	2	0 (0%)
date	Date	Date rage from 2020-01-01 to 2020-01-02.	Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days	3	1 (25%)

Should you want to add information to describe the variables described in the codebook, you can do so by left joining and additional data set:

# Set a secondary data.frame describing the variables in your original data set
more <- data.frame(
  "vars" = c("person", "org", "years_in_org", "role"),
  "description" = rep("Interesting details about my variable.", 4),
  "origin" = rep("Detailed notes on where the data came from.", 4),
  "notes" = rep("Yet more useful information", 4)
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE,
              extra_vars = more, extra_key = "vars")

Variable Name	Data Class	Valid Values	Statistics	Unique Values	Missing Values	Description	Origin	Notes
date	Date	Date rage from 2020-01-01 to 2020-01-02.	Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days	3	1 (25%)	NA	NA	NA
org	Character	Unique strings (n=4): csg, wdoc, ccjbh, and more.	4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1)	4	0 (0%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information
person	Character	Unique strings (n=4): chris, maeve, joseph, and more.	4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1)	4	0 (0%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information
role	Factor	Categorical variable with 2 levels: mentor, rca	2 Unique factors: mentor, rca	2	0 (0%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information
years_in_org	Numeric	Numeric range from 0.2 to 1.	Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44	4	1 (25%)	Interesting details about my variable.	Detailed notes on where the data came from.	Yet more useful information