7 min read

Great Hack for Documenting R Package Data


confusing grid of white square tiles with black seams
Photo by Kiran K. on Unsplash

View raw source for this post

Summary

When creating a package, documenting your data is a crucial step. While important, it can also be time-consuming. A high-dimensional dataset would require describing each variable. This post gives a quick method to pass an R-CMD-check and document data.

Table of Contents

Overview

When creating a package, documenting your data is a crucial step. While important, it can also be time-consuming. A high-dimensional dataset would require describing each variable. This post gives a quick method to pass an R-CMD-check and document data. The method relies on using paste, cat, and the multiple cursor feature to speed documentation. It reduces the number of mistakes made from manual entry and ensures the inclusion of all variables.

Roxygen for Package Documentation

Chapter 8 in the R-Packages book deals with data. This example pertains to the situation where an author saves the data to the data/ folder, meaning that it is “effectively exported”. Only exported data is documented.

# the 'usethis' package has a specific function
my_pkg_data <- sample(1000)
usethis::use_data(my_pkg_data)

Here’s some sample roxygen code taken from R-Packages. Two roxygen tags are important to note. First, the @format tag describes the dataset. From R-Packages, “you should include a definition list that describes each variable. It’s usually a good idea to describe the variables’ units . . . .” Second, the @source tag reminds you where the data originated.

#' World Health Organization TB data
#'
#' A subset of data from the World Health Organization Global Tuberculosis
#' Report ...
#'
#' @format ## `who`
#' A data frame with 7,240 rows and 60 columns:
#' \describe{
#'   \item{country}{Country name}
#'   \item{iso2, iso3}{2 & 3 letter ISO country codes}
#'   \item{year}{Year}
#'   ...
#' }
#' @source <https://www.who.int/teams/global-tuberculosis-programme/data>
"my_pkg_data"

Create Sample Table

To start, let’s create a sample table with 26 variables. (I chose 26 for the number of letters in the alphabet). Each column of the dataframe/tibble must be described so it gets a \item{} line. For a dataframe with a lot of columns, this is time-consuming. Here’s one way to get started.

library(tibble)
df <- tibble()
df <- rbind(1:26)
# generate 26 variable names of character lengths of 5 to 20
variable_names <- function(x) {
    paste0(sample(letters, x, replace = T), collapse = "")
}
# assign
names(df) <- sample(5:20, 26, replace = T) |>
    purrr::map_chr(variable_names)
df
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,]    15    16    17    18    19    20    21    22    23    24    25    26
attr(,"names")
 [1] "zmrcdtyqpupbpzexf"   "tomeqbnwwwbbxkrh"    "dxnlulqrannhklhcg"  
 [4] "mzpjnvzihayykrdeov"  "umbdfndihlrgngsxj"   "ribzrzlshevtorcchni"
 [7] "qjpha"               "efpgomezmlrdv"       "hvicguldtcegmopbsme"
[10] "ktxieovedtfbxypssqv" "lbahehftmxfdhpcb"    "potmymsftvhzxyp"    
[13] "qrwfu"               "tegbrsjbvoqiprtc"    "adqgcascafcopar"    
[16] "wbotzawrradsniozhoo" "xupzvlqtutkxfuka"    "xhangidttsgzpbdlntv"
[19] "gaqnpmlxp"           "llybzoymeidhgehg"    "knrknkbxgdl"        
[22] "llcwhgcymcveptsytu"  "wascpqsewkkrfxhczj"  "bidmngrrsjibdgfu"   
[25] "wioubalfvexykmj"     "fvbdgozzybrojqly"   

Generate Entries

Each variable requires a description. If you send this code to the console, you can copy and paste the output into your documentation. The descriptions are not helpful, but you won’t miss any variables or spend a lot of time comparing your data to your documentation.

paste0("#'   \\item{", names(df), "}{", names(df), "}") |>
    cat(sep = "\n")
#'   \item{zmrcdtyqpupbpzexf}{zmrcdtyqpupbpzexf}
#'   \item{tomeqbnwwwbbxkrh}{tomeqbnwwwbbxkrh}
#'   \item{dxnlulqrannhklhcg}{dxnlulqrannhklhcg}
#'   \item{mzpjnvzihayykrdeov}{mzpjnvzihayykrdeov}
#'   \item{umbdfndihlrgngsxj}{umbdfndihlrgngsxj}
#'   \item{ribzrzlshevtorcchni}{ribzrzlshevtorcchni}
#'   \item{qjpha}{qjpha}
#'   \item{efpgomezmlrdv}{efpgomezmlrdv}
#'   \item{hvicguldtcegmopbsme}{hvicguldtcegmopbsme}
#'   \item{ktxieovedtfbxypssqv}{ktxieovedtfbxypssqv}
#'   \item{lbahehftmxfdhpcb}{lbahehftmxfdhpcb}
#'   \item{potmymsftvhzxyp}{potmymsftvhzxyp}
#'   \item{qrwfu}{qrwfu}
#'   \item{tegbrsjbvoqiprtc}{tegbrsjbvoqiprtc}
#'   \item{adqgcascafcopar}{adqgcascafcopar}
#'   \item{wbotzawrradsniozhoo}{wbotzawrradsniozhoo}
#'   \item{xupzvlqtutkxfuka}{xupzvlqtutkxfuka}
#'   \item{xhangidttsgzpbdlntv}{xhangidttsgzpbdlntv}
#'   \item{gaqnpmlxp}{gaqnpmlxp}
#'   \item{llybzoymeidhgehg}{llybzoymeidhgehg}
#'   \item{knrknkbxgdl}{knrknkbxgdl}
#'   \item{llcwhgcymcveptsytu}{llcwhgcymcveptsytu}
#'   \item{wascpqsewkkrfxhczj}{wascpqsewkkrfxhczj}
#'   \item{bidmngrrsjibdgfu}{bidmngrrsjibdgfu}
#'   \item{wioubalfvexykmj}{wioubalfvexykmj}
#'   \item{fvbdgozzybrojqly}{fvbdgozzybrojqly}

Multiple Cursors

Once you’ve copied and pasted the console output into your documentation, you may have additional editing to do. One handy feature is the multiple cursor function. You can find it on a Mac with the keyboard shortcut option + mouse.

via GIPHY

Conclusion

Documenting your data is important. Many experts recommend the package form for programming because it makes you adhere to community norms and conform to good coding practices. When you are building a package, you may want to quickly document some data. This post can give you one strategy to program, cut, and paste your way to passing an R-CMD-check test and documenting your data.

Acknowledgements

This blog post was made possible thanks to:

References

[1]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2022 [Online]. Available: https://www.R-project.org/
[2]
Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2022 [Online]. Available: https://CRAN.R-project.org/package=blogdown
[3]
H. Wickham, J. Hester, W. Chang, and J. Bryan, Devtools: Tools to make developing r packages easier. 2022 [Online]. Available: https://CRAN.R-project.org/package=devtools
[4]
H. Wickham, J. Bryan, and M. Barrett, Usethis: Automate package and project setup. 2022 [Online]. Available: https://CRAN.R-project.org/package=usethis

Disclaimer

The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.3 (2022-03-10)
 os       macOS Big Sur/Monterey 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2022-10-21
 pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 blogdown    * 1.13       2022-09-24 [1] CRAN (R 4.1.2)
 bookdown      0.29       2022-09-12 [1] CRAN (R 4.1.3)
 bslib         0.4.0.9000 2022-08-26 [1] Github (rstudio/bslib@fa2e03c)
 cachem        1.0.6      2021-08-19 [1] CRAN (R 4.1.0)
 callr         3.7.2      2022-08-22 [1] CRAN (R 4.1.2)
 cli           3.4.1      2022-09-23 [1] CRAN (R 4.1.2)
 codetools     0.2-18     2020-11-04 [1] CRAN (R 4.1.3)
 colorspace    2.0-3      2022-02-21 [1] CRAN (R 4.1.2)
 crayon        1.5.2      2022-09-29 [1] CRAN (R 4.1.3)
 DBI           1.1.3      2022-06-18 [1] CRAN (R 4.1.2)
 devtools    * 2.4.4      2022-07-20 [1] CRAN (R 4.1.2)
 digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.0)
 dplyr         1.0.10     2022-09-01 [1] CRAN (R 4.1.2)
 ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 evaluate      0.16       2022-08-09 [1] CRAN (R 4.1.2)
 fansi         1.0.3      2022-03-24 [1] CRAN (R 4.1.2)
 fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 formatR       1.12       2022-03-31 [1] CRAN (R 4.1.2)
 fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.0)
 generics      0.1.3      2022-07-05 [1] CRAN (R 4.1.2)
 ggplot2     * 3.3.6      2022-05-03 [1] CRAN (R 4.1.2)
 ggthemes    * 4.2.4      2021-01-20 [1] CRAN (R 4.1.0)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
 gtable        0.3.1      2022-09-01 [1] CRAN (R 4.1.2)
 htmltools     0.5.3      2022-07-18 [1] CRAN (R 4.1.2)
 htmlwidgets   1.5.4      2021-09-08 [1] CRAN (R 4.1.0)
 httpuv        1.6.6      2022-09-08 [1] CRAN (R 4.1.2)
 jquerylib     0.1.4      2021-04-26 [1] CRAN (R 4.1.0)
 jsonlite      1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
 knitr         1.40       2022-08-24 [1] CRAN (R 4.1.3)
 later         1.3.0      2021-08-18 [1] CRAN (R 4.1.0)
 lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.1.2)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.1.2)
 memoise       2.0.1      2021-11-26 [1] CRAN (R 4.1.0)
 mime          0.12       2021-09-28 [1] CRAN (R 4.1.0)
 miniUI        0.1.1.1    2018-05-18 [1] CRAN (R 4.1.0)
 munsell       0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 pillar        1.8.1      2022-08-19 [1] CRAN (R 4.1.2)
 pkgbuild      1.3.1      2021-12-20 [1] CRAN (R 4.1.0)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload       1.3.0      2022-06-27 [1] CRAN (R 4.1.2)
 prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx      3.7.0      2022-07-07 [1] CRAN (R 4.1.2)
 profvis       0.3.7      2020-11-02 [1] CRAN (R 4.1.0)
 promises      1.2.0.1    2021-02-11 [1] CRAN (R 4.1.0)
 ps            1.7.1      2022-06-18 [1] CRAN (R 4.1.2)
 purrr         0.3.5      2022-10-06 [1] CRAN (R 4.1.2)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
 Rcpp          1.0.9      2022-07-08 [1] CRAN (R 4.1.2)
 remotes       2.4.2      2021-11-30 [1] CRAN (R 4.1.0)
 rlang         1.0.6      2022-09-24 [1] CRAN (R 4.1.2)
 rmarkdown     2.16       2022-08-24 [1] CRAN (R 4.1.2)
 rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.1.2)
 sass          0.4.2      2022-07-16 [1] CRAN (R 4.1.2)
 scales        1.2.1      2022-08-20 [1] CRAN (R 4.1.2)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.0)
 shiny         1.7.2      2022-07-19 [1] CRAN (R 4.1.2)
 stringi       1.7.8      2022-07-11 [1] CRAN (R 4.1.2)
 stringr       1.4.1      2022-08-20 [1] CRAN (R 4.1.2)
 tibble        3.1.8      2022-07-22 [1] CRAN (R 4.1.2)
 tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.1.2)
 urlchecker    1.0.1      2021-11-30 [1] CRAN (R 4.1.0)
 usethis     * 2.1.6      2022-05-25 [1] CRAN (R 4.1.2)
 utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs         0.4.2      2022-09-29 [1] CRAN (R 4.1.3)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.0)
 xfun          0.33       2022-09-12 [1] CRAN (R 4.1.2)
 xtable        1.8-4      2019-04-21 [1] CRAN (R 4.1.0)
 yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)

 [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────