Summary
This post demonstrates how to calculate the monthly percentage change in production, or other indicator, usingdplyr
.
Table of Contents
Overview
This blog post demonstrates how to (1) calculate the annual percentage change in production and (2) the cumulative change in production. The example dataset is the aus_production
data in the tstibbleData
package.
Australian Production
The data are quarterly estimates of selected indicators of manufacturing production in Australia. The data are from the first quarter of 1956 to the second quarter of 2010. Note that the data are stored in wide format.
# A tsibble: 6 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1956 Q1 284 5225 189 465 3923 5
2 1956 Q2 213 5178 204 532 4436 6
3 1956 Q3 227 5297 208 561 4806 7
4 1956 Q4 308 5681 197 570 4418 6
5 1957 Q1 262 5577 187 529 4339 5
6 1957 Q2 228 5651 214 604 4811 7
Wide to Long
The first objective is to convert the data from wide to long format. The pivot_longer()
function from the tidyverse
package is used to convert the data from wide to long format. Also, we’ll convert the data to an annual frequency and narrow the window to 11 years (1990-2000).
aus_production %>%
pivot_longer(Beer:Gas, names_to = "product", values_to = "volume") %>%
as_tsibble(key = product, index = Quarter) %>%
filter_index("1990 Q1" ~ "2000 Q1") %>%
index_by(year = year(Quarter)) %>%
group_by(product) %>%
summarize(avg_volume = mean(volume)) -> ap_long
Cumulative Percentage Change
I’ve spent a lot of time confused over how to transfrom raw data to a cumulative annual percentage rate. This can make for a really effective plot. Turns out, it’s really easy to do using the dplyr family of functions: first
and lag
. For the cumulative, we’ll use the first
function as the base year. In using the function, the data need to be in the correct order.
ap_long %>%
group_by(product) %>%
mutate(base = first(avg_volume),
diff = avg_volume - base,
pct = (diff / base)) -> ap_long_cum_pct
# A tsibble: 10 x 6 [1Y]
# Key: product [1]
# Groups: product [1]
product year avg_volume base diff pct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Beer 1990 488. 488. 0 0
2 Beer 1991 474. 488. -14 -0.0287
3 Beer 1992 451. 488. -37.2 -0.0763
4 Beer 1993 444 488. -44.5 -0.0911
5 Beer 1994 446 488. -42.5 -0.0870
6 Beer 1995 442. 488. -46 -0.0942
7 Beer 1996 428 488. -60.5 -0.124
8 Beer 1997 440. 488. -48 -0.0983
9 Beer 1998 436. 488. -52.2 -0.107
10 Beer 1999 441. 488. -47.8 -0.0977
Then, a plot was generated showing the cumulative percentage change in production. I used the scales::percent_format()
function to format the y-axis as a percentage and the scale_color_discrete_qualitative()
function from the colorspace
package for a qualitative color palette. Lastly, I used the theme_cowplot()
function from the cowplot
package to format the plot.
ap_long_cum_pct %>%
ggplot() +
aes(x = year, y = pct, group = product, color = product) +
geom_line() +
scale_y_continuous(name = "", labels = scales::percent_format()) +
scale_x_continuous(name = "", breaks = seq(1990, 2000, 2)) +
scale_color_discrete_qualitative(name = "Product", palette = "Dark2") +
labs(title = "Cumulative Pct. Change in Australian Production",
subtitle = "1990-2000") +
theme_cowplot()
Change in Annual Percentage Rate
Next, we’ll calculate the annual percentage change in production. This is a bit more complicated than the cumulative percentage change. We’ll use the lag
function to calculate the annual percentage change in production. The replace_na
function is used to replace the NA
values with 0
.
ap_long %>%
arrange(product, year) %>%
group_by(product) %>%
mutate(base = lag(avg_volume),
diff = avg_volume - base,
pct = (diff / base)) %>%
replace_na(list(pct = 0)) -> ap_long_ann_pct
# A tibble: 10 × 6
# Groups: product [1]
product year avg_volume base diff pct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Beer 1990 488. NA NA 0
2 Beer 1991 474. 488. -14 -0.0287
3 Beer 1992 451. 474. -23.2 -0.0490
4 Beer 1993 444 451. -7.25 -0.0161
5 Beer 1994 446 444 2 0.00450
6 Beer 1995 442. 446 -3.5 -0.00785
7 Beer 1996 428 442. -14.5 -0.0328
8 Beer 1997 440. 428 12.5 0.0292
9 Beer 1998 436. 440. -4.25 -0.00965
10 Beer 1999 441. 436. 4.5 0.0103
Finally, we’ll plot the annual percentage change in production to see the results.
ap_long_ann_pct %>%
ggplot() +
aes(x = year, y = pct, group = product, color = product) +
geom_line() +
scale_y_continuous(name = "", labels = scales::percent_format()) +
scale_x_continuous(name = "", breaks = seq(1990, 2000, 2)) +
scale_color_discrete_qualitative(name = "Product", palette = "Dark2") +
labs(title = "Annual Pct. Change in Australian Production",
subtitle = "1990-2000") +
theme_cowplot()
Conclusion
The dplyr
package makes it easy to calculate the annual percentage change in production and the cumulative change in production. The first
and lag
functions are particularly useful for these calculations.
References
Disclaimer
The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.
Reproducibility
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.4.1 (2024-06-14)
os macOS Sonoma 14.4
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-10-10
pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
anytime 0.3.9 2020-08-27 [1] CRAN (R 4.4.0)
blogdown * 1.19 2024-02-01 [1] CRAN (R 4.4.0)
bookdown 0.40 2024-07-02 [1] CRAN (R 4.4.0)
bslib 0.8.0 2024-07-29 [1] CRAN (R 4.4.0)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.1)
colorspace * 2.1-1 2024-07-26 [1] CRAN (R 4.4.0)
cowplot * 1.1.3 2024-01-22 [1] CRAN (R 4.4.0)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.0)
devtools * 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
distributional 0.5.0 2024-09-17 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
fable * 0.4.0 2024-09-25 [1] CRAN (R 4.4.1)
fabletools * 0.5.0 2024-09-17 [1] CRAN (R 4.4.1)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
feasts * 0.4.1 2024-09-25 [1] CRAN (R 4.4.1)
fpp3 * 1.0.1 2024-09-18 [1] CRAN (R 4.4.1)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
ggthemes * 5.1.0 2024-02-10 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
highr 0.11 2024-05-26 [1] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
knitr 1.48 2024-07-07 [1] CRAN (R 4.4.0)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.4.0)
Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.0)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.4.0)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
shiny 1.9.1 2024-08-01 [1] CRAN (R 4.4.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
tsibble * 1.1.5 2024-06-27 [1] CRAN (R 4.4.0)
tsibbledata * 0.4.1 2022-09-01 [1] CRAN (R 4.4.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
usethis * 3.0.0 2024-07-29 [1] CRAN (R 4.4.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
withr 3.0.1 2024-07-31 [1] CRAN (R 4.4.0)
xfun 0.47 2024-08-17 [1] CRAN (R 4.4.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────