Codebooks are often to provide a comprehensive guide to the variables and coding schemes in a data set, ensuring consistent and accurate interpretation of data. They serve as a reference tool to facilitate understanding and analysis of complex data structures by researchers and analysts.
To support this crucial functionality, we include a codebook function into this package:
# Load the library
library(rcahelpr)## {rcahelpr} loaded! Happy nerding.
# Create a test data set
test <- data.frame(
"person" = c("chris", "maeve", "joseph", "brooks"),
"org" = c("csg", "wdoc", "ccjbh", "asu"),
"years_in_org" = c(1, 0.3, 0.2, NA),
"role" = as.factor(c("mentor", "rca", "rca", "mentor")),
"date" = as.Date(c("2020-01-01", "2020-01-01", NA, "2020-01-02"))
)
# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE)| Variable Name | Data Class | Valid Values | Statistics | Unique Values | Missing Values |
|---|---|---|---|---|---|
| person | Character | Unique strings (n=4): chris, maeve, joseph, and more. |
4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1) |
4 | 0 (0%) |
| org | Character | Unique strings (n=4): csg, wdoc, ccjbh, and more. |
4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1) |
4 | 0 (0%) |
| years_in_org | Numeric | Numeric range from 0.2 to 1. |
Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44 |
4 | 1 (25%) |
| role | Factor | Categorical variable with 2 levels: mentor, rca | 2 Unique factors: mentor, rca | 2 | 0 (0%) |
| date | Date | Date rage from 2020-01-01 to 2020-01-02. |
Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days |
3 | 1 (25%) |
Should you want to add information to describe the variables described in the codebook, you can do so by left joining and additional data set:
# Set a secondary data.frame describing the variables in your original data set
more <- data.frame(
"vars" = c("person", "org", "years_in_org", "role"),
"description" = rep("Interesting details about my variable.", 4),
"origin" = rep("Detailed notes on where the data came from.", 4),
"notes" = rep("Yet more useful information", 4)
)
# Create codebook
make_codebook(input_df = test, return_df = FALSE, escape = FALSE,
extra_vars = more, extra_key = "vars")| Variable Name | Data Class | Valid Values | Statistics | Unique Values | Missing Values | Description | Origin | Notes |
|---|---|---|---|---|---|---|---|---|
| date | Date | Date rage from 2020-01-01 to 2020-01-02. |
Min: 2020-01-01 Mode: 2020-01-01 Max: 2020-01-02 Time difference: 1 days |
3 | 1 (25%) | NA | NA | NA |
| org | Character | Unique strings (n=4): csg, wdoc, ccjbh, and more. |
4 unique strings, top three: asu (n=1) ccjbh (n=1) csg (n=1) |
4 | 0 (0%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
| person | Character | Unique strings (n=4): chris, maeve, joseph, and more. |
4 unique strings, top three: brooks (n=1) chris (n=1) joseph (n=1) |
4 | 0 (0%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
| role | Factor | Categorical variable with 2 levels: mentor, rca | 2 Unique factors: mentor, rca | 2 | 0 (0%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |
| years_in_org | Numeric | Numeric range from 0.2 to 1. |
Min: 0.2 Avg: 0.5 Median: 0.3 Max: 1 SD: 0.44 |
4 | 1 (25%) | Interesting details about my variable. | Detailed notes on where the data came from. | Yet more useful information |