6 min read

Convert List of Unequal Length to Dataframe


list of unequal lengths
Lists are difficult to convert to a dataframe when they have unequal lengths.

View raw source for this post

Summary

Lists as a data type can be confusing but also useful. They can hold data of different types and lengths, making them very versatile. Lists can be named or nested and have the same or different lengths. This post deals with converting a list to a dataframe when it has unequal lengths.

Table of Contents

Create a Named List

First, we’ll create a named, nested list of different lengths. (This is a list of named lists). This example comes from blogging where authors will assign a category and tags to a post. The categories and tags may have just one or many values.

# my list (my.l)
my.l <- list()
my.l[[1]] <- list(categories = "R", tags = "list")
my.l[[2]] <- list(categories = "R", tags = c("list", "dataframe"))
names(my.l) <- c("post_1", "post_2")

Generate error

Several methods for converting lists to dataframe can be found in this stackoverflow question. However, when the most popular method is applied to the above list, it generates an error because the ‘tags’ variable has one value in post_1 and two values in post_2. The error reads, “invalid list argument: all variables should have the same length.” This is a common problem when scraping webpages where the html_nodes function will sometimes capture multiple values from a page. Others have noted the problem when an api call is made and the response is returned with missing values.

# Top solution on Stackoverflow
do.call(rbind.data.frame, my.l)
Error in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = default.stringsAsFactors(), : invalid list argument: all variables should have the same length

Simple Solution

Probably, the fastest and most direct method is to use the rbindlist function from the data.table package. Note the list names are omitted.

data.table::rbindlist(my.l, fill = TRUE)
   categories      tags
1:          R      list
2:          R      list
3:          R dataframe

Not-so-simple Solution

This is the not-so-simple solution. It introduced me to a new apply funcion rapply. It recursively applies a function to a list so will work in nested list situations. Collapsing all of the values into a single column in a data.frame allows me to easily inspect the differences as it iterates over the list. It also allows me flexibility to split the column by row or column.

# combine with info from list page
new.l <- rapply(my.l, function(x) paste(x, collapse = "|"), how = "replace")
# fast
dt <- data.table::rbindlist(new.l)
dt$names <- names(new.l)
dt
   categories           tags  names
1:          R           list post_1
2:          R list|dataframe post_2

Separate Rows

Using the example above, separating by row makes more sense as it would allow to the dataframe to be filtered by both category and tag.

dt %>% tidyr::separate_rows(tags, sep = "\\|")
# A tibble: 3 x 3
  categories tags      names 
  <chr>      <chr>     <chr> 
1 R          list      post_1
2 R          list      post_2
3 R          dataframe post_2

Separate Columns

Using the example above, you can also separate the column by a character as well. I’m not sure it makes a lot of sense for this example.

dt %>% tidyr::separate(tags, into = c("tag_1", "tag_2"), sep = "\\|")
   categories tag_1     tag_2  names
1:          R  list      <NA> post_1
2:          R  list dataframe post_2

Other Packages

Two other packages offer similar functionality. The first package purrr has a function map_dfr which returns a data frame created by row-binding and column-binding respectively. [1] The second package is rlist which has functions list.rbind and list.cbind for the task. [2]

References

[1]
L. Henry and H. Wickham, Purrr: Functional programming tools. 2020 [Online]. Available: https://CRAN.R-project.org/package=purrr
[2]
K. Ren, Rlist: A toolbox for non-tabular data manipulation. 2016 [Online]. Available: https://CRAN.R-project.org/package=rlist
[3]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2020 [Online]. Available: https://www.R-project.org/
[4]
Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2021 [Online]. Available: https://CRAN.R-project.org/package=blogdown
[5]
H. Wickham, Tidyverse: Easily install and load the tidyverse. 2019 [Online]. Available: https://CRAN.R-project.org/package=tidyverse

Disclaimer

The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       macOS Catalina 10.15.7      
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Chicago             
 date     2021-04-04                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
 blogdown    * 1.2     2021-03-04 [1] CRAN (R 3.6.3)
 bookdown      0.21    2020-10-13 [1] CRAN (R 3.6.3)
 bslib         0.2.4   2021-01-25 [1] CRAN (R 3.6.2)
 cachem        1.0.4   2021-02-13 [1] CRAN (R 3.6.2)
 callr         3.5.1   2020-10-13 [1] CRAN (R 3.6.2)
 cli           2.3.1   2021-02-23 [1] CRAN (R 3.6.3)
 codetools     0.2-18  2020-11-04 [1] CRAN (R 3.6.2)
 colorspace    2.0-0   2020-11-11 [1] CRAN (R 3.6.2)
 crayon        1.4.1   2021-02-08 [1] CRAN (R 3.6.2)
 DBI           1.1.1   2021-01-15 [1] CRAN (R 3.6.2)
 desc          1.3.0   2021-03-05 [1] CRAN (R 3.6.3)
 devtools    * 2.3.2   2020-09-18 [1] CRAN (R 3.6.2)
 digest        0.6.27  2020-10-24 [1] CRAN (R 3.6.2)
 dplyr         1.0.5   2021-03-05 [1] CRAN (R 3.6.3)
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.2)
 evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
 fansi         0.4.2   2021-01-15 [1] CRAN (R 3.6.2)
 farver        2.1.0   2021-02-28 [1] CRAN (R 3.6.3)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 3.6.2)
 fs            1.5.0   2020-07-31 [1] CRAN (R 3.6.2)
 generics      0.1.0   2020-10-31 [1] CRAN (R 3.6.2)
 ggplot2     * 3.3.3   2020-12-30 [1] CRAN (R 3.6.2)
 ggthemes    * 4.2.4   2021-01-20 [1] CRAN (R 3.6.2)
 glue          1.4.2   2020-08-27 [1] CRAN (R 3.6.2)
 gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.0)
 highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)
 htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 3.6.2)
 jquerylib     0.1.3   2020-12-17 [1] CRAN (R 3.6.2)
 jsonlite      1.7.2   2020-12-09 [1] CRAN (R 3.6.2)
 knitr         1.31    2021-01-27 [1] CRAN (R 3.6.2)
 labeling      0.4.2   2020-10-20 [1] CRAN (R 3.6.2)
 lifecycle     1.0.0   2021-02-15 [1] CRAN (R 3.6.2)
 magrittr      2.0.1   2020-11-17 [1] CRAN (R 3.6.2)
 memoise       2.0.0   2021-01-26 [1] CRAN (R 3.6.2)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.0)
 pillar        1.5.1   2021-03-05 [1] CRAN (R 3.6.3)
 pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 3.6.2)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
 pkgload       1.2.0   2021-02-23 [1] CRAN (R 3.6.3)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.0)
 processx      3.4.5   2020-11-30 [1] CRAN (R 3.6.2)
 ps            1.6.0   2021-02-28 [1] CRAN (R 3.6.3)
 purrr         0.3.4   2020-04-17 [1] CRAN (R 3.6.2)
 R6            2.5.0   2020-10-28 [1] CRAN (R 3.6.2)
 remotes       2.2.0   2020-07-21 [1] CRAN (R 3.6.2)
 rlang         0.4.10  2020-12-30 [1] CRAN (R 3.6.2)
 rmarkdown     2.7     2021-02-19 [1] CRAN (R 3.6.3)
 rprojroot     2.0.2   2020-11-15 [1] CRAN (R 3.6.2)
 sass          0.3.1   2021-01-24 [1] CRAN (R 3.6.2)
 scales        1.1.1   2020-05-11 [1] CRAN (R 3.6.2)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
 stringi       1.5.3   2020-09-09 [1] CRAN (R 3.6.2)
 stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
 testthat      3.0.2   2021-02-14 [1] CRAN (R 3.6.2)
 tibble        3.1.0   2021-02-25 [1] CRAN (R 3.6.3)
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 3.6.2)
 usethis     * 2.0.1   2021-02-10 [1] CRAN (R 3.6.2)
 utf8          1.1.4   2018-05-24 [1] CRAN (R 3.6.0)
 vctrs         0.3.6   2020-12-17 [1] CRAN (R 3.6.2)
 withr         2.4.1   2021-01-26 [1] CRAN (R 3.6.2)
 xfun          0.21    2021-02-10 [1] CRAN (R 3.6.2)
 yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.0)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library