8 min read

Monthly Percentage Change in Production with Dplyr


plot of aus_production
Plot of Australian Production

View raw source for this post

Summary

This post demonstrates how to calculate the monthly percentage change in production, or other indicator, using dplyr.

Table of Contents

Overview

This blog post demonstrates how to (1) calculate the annual percentage change in production and (2) the cumulative change in production. The example dataset is the aus_production data in the tstibbleData package.

Australian Production

The data are quarterly estimates of selected indicators of manufacturing production in Australia. The data are from the first quarter of 1956 to the second quarter of 2010. Note that the data are stored in wide format.

# A tsibble: 6 x 7 [1Q]
  Quarter  Beer Tobacco Bricks Cement Electricity   Gas
    <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
1 1956 Q1   284    5225    189    465        3923     5
2 1956 Q2   213    5178    204    532        4436     6
3 1956 Q3   227    5297    208    561        4806     7
4 1956 Q4   308    5681    197    570        4418     6
5 1957 Q1   262    5577    187    529        4339     5
6 1957 Q2   228    5651    214    604        4811     7

Wide to Long

The first objective is to convert the data from wide to long format. The pivot_longer() function from the tidyverse package is used to convert the data from wide to long format. Also, we’ll convert the data to an annual frequency and narrow the window to 11 years (1990-2000).

aus_production %>% 
pivot_longer(Beer:Gas, names_to = "product", values_to = "volume") %>% 
as_tsibble(key = product, index = Quarter) %>% 
filter_index("1990 Q1" ~ "2000 Q1") %>%
index_by(year = year(Quarter)) %>% 
group_by(product) %>% 
summarize(avg_volume = mean(volume)) -> ap_long

Cumulative Percentage Change

I’ve spent a lot of time confused over how to transfrom raw data to a cumulative annual percentage rate. This can make for a really effective plot. Turns out, it’s really easy to do using the dplyr family of functions: first and lag. For the cumulative, we’ll use the first function as the base year. In using the function, the data need to be in the correct order.

ap_long %>%
group_by(product) %>%
mutate(base = first(avg_volume),
	   diff = avg_volume - base,
	   pct = (diff / base)) -> ap_long_cum_pct
# A tsibble: 10 x 6 [1Y]
# Key:       product [1]
# Groups:    product [1]
   product  year avg_volume  base  diff     pct
   <chr>   <dbl>      <dbl> <dbl> <dbl>   <dbl>
 1 Beer     1990       488.  488.   0    0     
 2 Beer     1991       474.  488. -14   -0.0287
 3 Beer     1992       451.  488. -37.2 -0.0763
 4 Beer     1993       444   488. -44.5 -0.0911
 5 Beer     1994       446   488. -42.5 -0.0870
 6 Beer     1995       442.  488. -46   -0.0942
 7 Beer     1996       428   488. -60.5 -0.124 
 8 Beer     1997       440.  488. -48   -0.0983
 9 Beer     1998       436.  488. -52.2 -0.107 
10 Beer     1999       441.  488. -47.8 -0.0977

Then, a plot was generated showing the cumulative percentage change in production. I used the scales::percent_format() function to format the y-axis as a percentage and the scale_color_discrete_qualitative() function from the colorspace package for a qualitative color palette. Lastly, I used the theme_cowplot() function from the cowplot package to format the plot.

ap_long_cum_pct %>% 
ggplot() +
aes(x = year, y = pct, group = product, color = product) +
geom_line() +
scale_y_continuous(name = "", labels = scales::percent_format()) +
scale_x_continuous(name = "", breaks = seq(1990, 2000, 2)) +
scale_color_discrete_qualitative(name = "Product", palette = "Dark2") +
labs(title = "Cumulative Pct. Change in Australian Production",
	 subtitle = "1990-2000") +
theme_cowplot() 

Change in Annual Percentage Rate

Next, we’ll calculate the annual percentage change in production. This is a bit more complicated than the cumulative percentage change. We’ll use the lag function to calculate the annual percentage change in production. The replace_na function is used to replace the NA values with 0.

ap_long %>% 
arrange(product, year) %>% 
group_by(product) %>%
mutate(base = lag(avg_volume),
	   diff = avg_volume - base,
	   pct = (diff / base)) %>% 
replace_na(list(pct = 0)) -> ap_long_ann_pct
# A tibble: 10 × 6
# Groups:   product [1]
   product  year avg_volume  base   diff      pct
   <chr>   <dbl>      <dbl> <dbl>  <dbl>    <dbl>
 1 Beer     1990       488.   NA   NA     0      
 2 Beer     1991       474.  488. -14    -0.0287 
 3 Beer     1992       451.  474. -23.2  -0.0490 
 4 Beer     1993       444   451.  -7.25 -0.0161 
 5 Beer     1994       446   444    2     0.00450
 6 Beer     1995       442.  446   -3.5  -0.00785
 7 Beer     1996       428   442. -14.5  -0.0328 
 8 Beer     1997       440.  428   12.5   0.0292 
 9 Beer     1998       436.  440.  -4.25 -0.00965
10 Beer     1999       441.  436.   4.5   0.0103 

Finally, we’ll plot the annual percentage change in production to see the results.

ap_long_ann_pct %>% 
ggplot() +
aes(x = year, y = pct, group = product, color = product) +
geom_line() +
scale_y_continuous(name = "", labels = scales::percent_format()) +
scale_x_continuous(name = "", breaks = seq(1990, 2000, 2)) +
scale_color_discrete_qualitative(name = "Product", palette = "Dark2") +
labs(title = "Annual Pct. Change in Australian Production",
	 subtitle = "1990-2000") +
theme_cowplot()

Conclusion

The dplyr package makes it easy to calculate the annual percentage change in production and the cumulative change in production. The first and lag functions are particularly useful for these calculations.

Acknowledgements

This blog post was made possible thanks to:

  • the tsibbleData package

References

[1]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2024 [Online]. Available: https://www.R-project.org/
[2]
Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2024 [Online]. Available: https://github.com/rstudio/blogdown

Disclaimer

The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.4
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-10-10
 pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package        * version date (UTC) lib source
 anytime          0.3.9   2020-08-27 [1] CRAN (R 4.4.0)
 blogdown       * 1.19    2024-02-01 [1] CRAN (R 4.4.0)
 bookdown         0.40    2024-07-02 [1] CRAN (R 4.4.0)
 bslib            0.8.0   2024-07-29 [1] CRAN (R 4.4.0)
 cachem           1.1.0   2024-05-16 [1] CRAN (R 4.4.0)
 cli              3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
 codetools        0.2-20  2024-03-31 [1] CRAN (R 4.4.1)
 colorspace     * 2.1-1   2024-07-26 [1] CRAN (R 4.4.0)
 cowplot        * 1.1.3   2024-01-22 [1] CRAN (R 4.4.0)
 crayon           1.5.3   2024-06-20 [1] CRAN (R 4.4.0)
 devtools       * 2.4.5   2022-10-11 [1] CRAN (R 4.4.0)
 digest           0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
 distributional   0.5.0   2024-09-17 [1] CRAN (R 4.4.1)
 dplyr          * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
 ellipsis         0.3.2   2021-04-29 [1] CRAN (R 4.4.0)
 evaluate         0.24.0  2024-06-10 [1] CRAN (R 4.4.0)
 fable          * 0.4.0   2024-09-25 [1] CRAN (R 4.4.1)
 fabletools     * 0.5.0   2024-09-17 [1] CRAN (R 4.4.1)
 fansi            1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
 farver           2.1.2   2024-05-13 [1] CRAN (R 4.4.0)
 fastmap          1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
 feasts         * 0.4.1   2024-09-25 [1] CRAN (R 4.4.1)
 fpp3           * 1.0.1   2024-09-18 [1] CRAN (R 4.4.1)
 fs               1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
 generics         0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
 ggplot2        * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
 ggthemes       * 5.1.0   2024-02-10 [1] CRAN (R 4.4.0)
 glue             1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
 gtable           0.3.5   2024-04-22 [1] CRAN (R 4.4.0)
 highr            0.11    2024-05-26 [1] CRAN (R 4.4.0)
 htmltools        0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets      1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
 httpuv           1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
 jquerylib        0.1.4   2021-04-26 [1] CRAN (R 4.4.0)
 jsonlite         1.8.8   2023-12-04 [1] CRAN (R 4.4.0)
 knitr            1.48    2024-07-07 [1] CRAN (R 4.4.0)
 labeling         0.4.3   2023-08-29 [1] CRAN (R 4.4.0)
 later            1.3.2   2023-12-06 [1] CRAN (R 4.4.0)
 lifecycle        1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
 lubridate      * 1.9.3   2023-09-27 [1] CRAN (R 4.4.0)
 magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
 memoise          2.0.1   2021-11-26 [1] CRAN (R 4.4.0)
 mime             0.12    2021-09-28 [1] CRAN (R 4.4.0)
 miniUI           0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
 munsell          0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
 pillar           1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
 pkgbuild         1.4.4   2024-03-17 [1] CRAN (R 4.4.0)
 pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
 pkgload          1.4.0   2024-06-28 [1] CRAN (R 4.4.0)
 profvis          0.3.8   2023-05-02 [1] CRAN (R 4.4.0)
 promises         1.3.0   2024-04-05 [1] CRAN (R 4.4.0)
 purrr            1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
 R6               2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
 rappdirs         0.3.3   2021-01-31 [1] CRAN (R 4.4.0)
 Rcpp             1.0.13  2024-07-17 [1] CRAN (R 4.4.0)
 remotes          2.5.0   2024-03-17 [1] CRAN (R 4.4.0)
 rlang            1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown        2.28    2024-08-17 [1] CRAN (R 4.4.0)
 rstudioapi       0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
 sass             0.4.9   2024-03-15 [1] CRAN (R 4.4.0)
 scales           1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
 sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
 shiny            1.9.1   2024-08-01 [1] CRAN (R 4.4.0)
 stringi          1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
 stringr          1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
 tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
 tidyr          * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
 tidyselect       1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
 timechange       0.3.0   2024-01-18 [1] CRAN (R 4.4.0)
 tsibble        * 1.1.5   2024-06-27 [1] CRAN (R 4.4.0)
 tsibbledata    * 0.4.1   2022-09-01 [1] CRAN (R 4.4.0)
 urlchecker       1.0.1   2021-11-30 [1] CRAN (R 4.4.0)
 usethis        * 3.0.0   2024-07-29 [1] CRAN (R 4.4.0)
 utf8             1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
 vctrs            0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
 withr            3.0.1   2024-07-31 [1] CRAN (R 4.4.0)
 xfun             0.47    2024-08-17 [1] CRAN (R 4.4.0)
 xtable           1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
 yaml             2.3.10  2024-07-26 [1] CRAN (R 4.4.0)

 [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────