Skip to contents

Codebooks are often to provide a comprehensive guide to the variables and coding schemes in a data set, ensuring consistent and accurate interpretation of data. They serve as a reference tool to facilitate understanding and analysis of complex data structures by researchers and analysts.

To support this crucial functionality, we include a codebook function into this package:

# Load the library
library(rcahelpr)
## {rcahelpr} loaded! Happy nerding.
# Create a test data set
test <- data.frame(
  "person" = c("chris", "maeve", "joseph", "brooks"),
  "org" = c("csg", "wdoc", "ccjbh", "asu"),
  "years_in_org" = c(1, 0.3, 0.2, NA),
  "role" = as.factor(c("mentor", "rca", "rca", "mentor")),
  "date" = as.Date(c("2020-01-01", "2020-01-01", NA, "2020-01-02"))
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE)
Variable Name Data Class Valid Values Statistics Unique Values Missing Values
person Character Unique strings (n=4): chris, maeve, joseph, and more. 4 unique strings, top three:
brooks (n=1)
chris (n=1)
joseph (n=1)
4 0 (0%)
org Character Unique strings (n=4): csg, wdoc, ccjbh, and more. 4 unique strings, top three:
asu (n=1)
ccjbh (n=1)
csg (n=1)
4 0 (0%)
years_in_org Numeric Numeric range from 0.2 to 1. Min: 0.2
Avg: 0.5
Median: 0.3
Max: 1
SD: 0.44
4 1 (25%)
role Factor Categorical variable with 2 levels: mentor, rca 2 Unique factors: mentor, rca 2 0 (0%)
date Date Date rage from 2020-01-01 to 2020-01-02. Min: 2020-01-01
Mode: 2020-01-01
Max: 2020-01-02
Time difference: 1 days
3 1 (25%)

Should you want to add information to describe the variables described in the codebook, you can do so by left joining and additional data set:

# Set a secondary data.frame describing the variables in your original data set
more <- data.frame(
  "vars" = c("person", "org", "years_in_org", "role"),
  "description" = rep("Interesting details about my variable.", 4),
  "origin" = rep("Detailed notes on where the data came from.", 4),
  "notes" = rep("Yet more useful information", 4)
)

# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE,
              extra_vars = more, extra_key = "vars")
Variable Name Data Class Valid Values Statistics Unique Values Missing Values Description Origin Notes
date Date Date rage from 2020-01-01 to 2020-01-02. Min: 2020-01-01
Mode: 2020-01-01
Max: 2020-01-02
Time difference: 1 days
3 1 (25%) NA NA NA
org Character Unique strings (n=4): csg, wdoc, ccjbh, and more. 4 unique strings, top three:
asu (n=1)
ccjbh (n=1)
csg (n=1)
4 0 (0%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information
person Character Unique strings (n=4): chris, maeve, joseph, and more. 4 unique strings, top three:
brooks (n=1)
chris (n=1)
joseph (n=1)
4 0 (0%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information
role Factor Categorical variable with 2 levels: mentor, rca 2 Unique factors: mentor, rca 2 0 (0%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information
years_in_org Numeric Numeric range from 0.2 to 1. Min: 0.2
Avg: 0.5
Median: 0.3
Max: 1
SD: 0.44
4 1 (25%) Interesting details about my variable. Detailed notes on where the data came from. Yet more useful information