Summary
I’ve been piping code with dplyr for several years now. But because I use it so often, I thought it was time for a refresher and some renewed investigation. Sometimes, revisiting a topic can help break me out of a programming funk. And sure enough, some new methods were discovered.Table of Contents
Introduction
The magrittr
package allows for the piping of R code. It is to be given a strong French pronuniciation according to vignette("magrittr")
. The package was named tongue-in-cheek after the artist Rene’ Magritte. As Magritte’s painting of a pipe was not an actual pipe, neither is magrittr’s operator %>%
an actual pipe. Rather it is a convenient way for code to be written from left to right without the nesting of functions or the creation of temporary variables. Nesting of functions is confusing and temporary variables clutter up the global environment. Using magrittr
, f(x)
is the equivalent of x %>% f()
and x %>% f(.)
where dot “.” is the placeholder for “x”.
#load library
library(magrittr)
Assignment
Some disfavor the use of magrittr
for assignment, preferring the traditional <-
.
# More common way to assign
x <- 10 %>% divide_by(2)
print(x)
[1] 5
# Less common
env <- environment()
"x" %>% assign(5, envir = env) %>% print
[1] 5
When to use
Use pipes if you have (1) shorter than 10 steps, (2) a single input or output, and (3) simple dependencies.
Argument Placeholder
From the magrittr tidyverse page cited below, here are two examples of using the “.” as an argument placeholder:
x %>% f(y, .) is equivalent to f(y, x)
x %>% f(y, z = .) is equivalent to f(y, z = x)
Operators - 4
The only operator that I’ve used consistently is the first one.
%>%
# As described above
10 %>% divide_by(5)
[1] 2
%T>%
Also referred to as the “t-pipe,” it is helpful to determine the output in a series of chained commands.
# Note the 'NULL'
rnorm(100) %>%
matrix(ncol = 2) %>%
plot() %>%
str()
NULL
#Note the matrix output
rnorm(100) %>%
matrix(ncol = 2) %T>%
plot() %>%
str()
num [1:50, 1:2] 0.466 -0.161 1.519 1.998 -0.532 ...
%$%
car_data <-
mtcars %>%
subset(hp > 100) %>%
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
transform(kpl = mpg %>% multiply_by(0.4251)) %>%
print
cyl mpg disp hp drat wt qsec vs am gear carb kpl
1 4 25.90 108.05 111.00 3.94 2.15 17.75 1.00 1.00 4.50 2.00 11.010090
2 6 19.74 183.31 122.29 3.59 3.12 17.98 0.57 0.43 3.86 3.43 8.391474
3 8 15.10 353.10 209.21 3.23 4.00 16.77 0.00 0.14 3.29 3.50 6.419010
%<>%
The above symbol is used for assignment, though disfavored.
# data(LakeHuron)--lake depth
LakeHuron %<>% head(3) %>% print
Functions
A function can be created by piping.
Unary Functions
# Unary functions
f <- . %>% head(3)
chickwts %>% f(.)
weight feed
1 179 horsebean
2 160 horsebean
3 136 horsebean
Lambda Functions
Functions can be defined and executed within piped code.
Long-hand Notation
car_data %>%
(function(x) {
if (nrow(x) > 2)
rbind(head(x, 1), tail(x, 1))
else x
})
Shorthand Notation
car_data %>%
{
if (nrow(.) > 0)
rbind(head(., 1), tail(., 1))
else .
}
Aliases
Aliases can greatly improve the readibility of your code. They can be found with the help command: ?magrittr::extract
. I’m frequently converting numbers to percentages.
.345 %>% multiply_by(100) %>% round(2) %>% paste0("%")
[1] "34.5%"
Examples
mtcars
There are at least two things within the code that are not obvious, at least to me. First, the piping operator %>%
is being used within the functions themselves, instead of just the end of the line. Second, the aggregate
function contains a nested unary function FUN = . %>% mean %>% round(2)
.
# magrittr vignette
car_data <-
mtcars %>%
subset(hp > 100) %>%
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
transform(kpl = mpg %>% multiply_by(0.4251)) %>%
print
cyl mpg disp hp drat wt qsec vs am gear carb kpl
1 4 25.90 108.05 111.00 3.94 2.15 17.75 1.00 1.00 4.50 2.00 11.010090
2 6 19.74 183.31 122.29 3.59 3.12 17.98 0.57 0.43 3.86 3.43 8.391474
3 8 15.10 353.10 209.21 3.23 4.00 16.77 0.00 0.14 3.29 3.50 6.419010
mtcars %>%
tibble(.) %>%
tidyr::drop_na() %>%
mutate(type = rownames(mtcars)) %>%
filter(!grepl('^A|L', type)) %>%
filter(am == 0) %>%
select(cyl, mpg) %>%
group_by(cyl) %>%
summarize(avg_mpg = mpg %>% mean() %>% round(0)) %>%
arrange(-cyl) %>%
set_colnames(c("cylinder", "avg_mpg"))
# A tibble: 3 × 2
cylinder avg_mpg
<dbl> <dbl>
1 8 15
2 6 19
3 4 23
starwars
Use data(starwars)
. Examples from the vignette("dplyr")
.
starwars %>% filter(skin_color == "light", eye_color == "brown")
# A tibble: 7 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Leia Org… 150 49 brown light brown 19 fema… femin…
2 Biggs Da… 183 84 black light brown 24 male mascu…
3 Cordé 157 NA brown light brown NA fema… femin…
4 Dormé 165 NA brown light brown NA fema… femin…
5 Raymus A… 188 79 brown light brown NA male mascu…
6 Poe Dame… NA NA brown light brown NA male mascu…
7 Padmé Am… 165 45 brown light brown 46 fema… femin…
# … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
library(kableExtra)
starwars %>%
select(name, height, mass, birth_year, sex, species, homeworld) %>%
filter(species == "Droid") %>%
arrange(desc(height)) %>%
slice_head(n = 4) %>%
kbl()
name | height | mass | birth_year | sex | species | homeworld |
---|---|---|---|---|---|---|
IG-88 | 200 | 140 | 15 | none | Droid | NA |
C-3PO | 167 | 75 | 112 | none | Droid | Tatooine |
R5-D4 | 97 | 32 | NA | none | Droid | Tatooine |
R2-D2 | 96 | 32 | 33 | none | Droid | Naboo |
babynames
This example was taken from R-bloggers post written by Stefan Milton, the author of the magrittr
package. The post is included in the acknowledgements.
library(babynames)
babynames %>%
filter(name %>% substr(1, 3) %>% equals("Ste")) %>%
group_by(year, sex) %>%
summarize(total = sum(n)) %>%
qplot(year, total, color = sex, data = ., geom = "line") %>%
add(ggtitle('Names starting with "Ste"')) %>%
print
Conditional Piping
Making the flow of the pipe condition was the subject of this stackoverflow question.
x <- 1
y <- T
x %>%
add(1) %>%
{if(y) add(.,1) else .}
[1] 3
library(purrr)
1:10 %>%
when(
sum(.) <= x ~ sum(.),
sum(.) <= 2*x ~ sum(.)/2,
~ 0,
x = 60
)
[1] 55
Conclusion
It appears that base R has included its own pipe |>
in a development version, thus making the magrittr
package obsolete in future R versions. This was announced at the 2020 useR Conference. Piping is used widely in many languages and one could reasonably expect that, aside from syntax, the development |>
pipe would function similarly to magrittr’s %>%
pipe. Although to be clear, it’s not a real pipe either! Happy piping!
Acknowledgements
This blog post was made possible thanks to:
The inimitable Hadley Wickham and another of his books, “R for Data Science”. (Specifically, Chapter 18–“Pipes”.)
“magrittr: part of the tidyverse”
Simpler R coding with pipes > the present and future of the magrittr package
In September, 2021, a detailed history on piping was written by Adolpho Alvarez, “Plumbers, chains, and famous painters: The (updated) history of the pipe operator in R”.
References
Disclaimer
The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.
Reproducibility
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2022-03-28
pandoc 2.14.1 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
babynames * 1.0.1 2021-04-12 [1] CRAN (R 4.1.0)
blogdown * 1.8 2022-02-16 [1] CRAN (R 4.1.2)
bookdown 0.25 2022-03-16 [1] CRAN (R 4.1.2)
brio 1.1.3 2021-11-30 [1] CRAN (R 4.1.0)
bslib 0.3.1.9000 2022-03-04 [1] Github (rstudio/bslib@888fbe0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.0)
colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.0)
DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.0)
desc 1.4.1 2022-03-06 [1] CRAN (R 4.1.2)
devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.1.0)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0)
dplyr * 1.0.8 2022-02-08 [1] CRAN (R 4.1.2)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.2)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.0)
generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
ggthemes * 4.2.4 2021-01-20 [1] CRAN (R 4.1.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.2)
kableExtra * 1.3.4 2021-02-20 [1] CRAN (R 4.1.0)
knitr 1.38 2022-03-25 [1] CRAN (R 4.1.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
magrittr * 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.0)
munsell 0.5.0.9000 2021-10-19 [1] Github (cwickham/munsell@e539541)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.3 2022-03-25 [1] CRAN (R 4.1.0)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.0)
rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.2)
rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.2)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.0)
sass 0.4.1 2022-03-23 [1] CRAN (R 4.1.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.0)
stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
svglite 2.1.0 2022-02-03 [1] CRAN (R 4.1.2)
systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.1.2)
testthat 3.1.2 2022-01-20 [1] CRAN (R 4.1.2)
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.2)
tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
usethis * 2.1.5 2021-12-09 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
webshot 0.5.2 2019-11-22 [1] CRAN (R 4.1.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.0)
xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.0)
yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────