10 min read

Simple Features Tutorial in R


sf_package
logo for the sf package

View raw source for this post

Summary

This post is an introduction to working with the sf package. “sf” is short for “simple features” and is an open source standard to interact with vector geometries. Downloads of the sf package now exceed other R packages dealing with spatial data.

Table of Contents

Introduction

This post is an introduction to working with the sf package. “sf” is short for “simple features” and is an open-source model and standard to interact with vector geometries. Downloads of the sf package now exceed other R packages dealing with spatial data. And going forward, sf is likely to become the primary package in spatial analysis.

According to Roger Bivand, an early pioneer in the use of R for spatial analysis, “the transition to sf from sp, rgdal, and rgeos should be unproblematic. Shifting to new visualization packages like tmap, cartography and mapview should also be relatively easy, and use of sf and the new visualization packages should certainly become standard for new research and teaching.”[1]

The simple features package was first released in January 2017. Its design and philosophy mirror other packages included in the “tidyverse”. Thus, manipulating data in sf classed objects is similar to how it would be accomplished in dplyr.

Robin Lovelace is an authority in the field who has been immensely active in geospatial development for R. He is a contributor to the sf package. His book, “Geocomputation with R” is recommended. My first exposure to his work was a tutorial that used a previous package and his work was consulted every time I had to do a merge/join.

This post extracts portions of his book that are most commonly performed in everyday spatial analysis. An excellent cheat sheet is also available. Two sources of data are used: the US Census Bureau cartographic boundaries for counties and USDA county unemployment files.

US Census Bureau

The county geospatial data come from the US Census.

#read in file
library(sf)
#all counties
counties <- 
  sf::st_read(
    dsn = "cb_2020_us_county_20m",
    #do not put file extentsion .shp on end!
    layer = "cb_2020_us_county_20m"
    ) |>
  # convert coordinates to simple feature
  sf::st_transform('+proj=longlat +datum=WGS84') |>
  # reduce file size
    sf::st_simplify() |> 
  # convert variables to lowercase
  dplyr::rename_all(~janitor::make_clean_names(.x))
Reading layer `cb_2020_us_county_20m' from data source 
  `/Users/robwiederstein/Dropbox/coding/rproj/my-hugo-blog/content/post/2022-03-28-simple-features-tutorial-in-r-number-2/cb_2020_us_county_20m' 
  using driver `ESRI Shapefile'
Simple feature collection with 3221 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83

Subset to State

Filter the dataframe to the state of interest. The result can be reached using a dplyr pipe.

#california fips is "06"
ca <- 
  counties |> 
  dplyr::filter(grepl("06", statefp))

Sanity Check

Before proceeding further, efficiency requires the collected data represents what we intended; namely, the state of California. After inspection, we appear headed on the right track.

plot(ca |> dplyr::select(countyfp))

Subset to Counties

Next, let’s choose three random California counties and plot them using Leaflet.

ca_counties <-
  ca |> 
  dplyr::filter(name %in% c("Los Angeles", "San Francisco", "Napa")) 

library(leaflet)
  leaflet() |> 
    fitBounds(-122.6, 32.8, -117.6, 38.8) |> 
    addTiles() |> 
    addPolygons(data = ca_counties,
                popup = ~name)

Group Counties by Dplyr

counties_join_by_group <-
    ca |>
    dplyr::mutate(number = 1) |>
    dplyr::group_by(statefp) |>
    dplyr::summarize(tot = sum(number))
#map 
leaflet() |> 
    #fitBounds(-124.4, 32.5, -114.1,  42) |> 
        setView(-120.52462403790528, 37.74238197955183, zoom = 5) |>
    addTiles() |> 
    addPolygons(data = counties_join_by_group)

Counties Join by Union

The other way to join geographic subdivisions together is through the use of sf::st_union().

# join counties into California
counties_join_by_union <- sf::st_union(ca)
# map
leaflet() %>%
    fitBounds(-124.4, 32.5, -114.1,  42) |> 
    #base groups
    addTiles(group = "OSM (default)") |>
    addPolygons(
        data = counties_join_by_union
    )

Calculate Centroids

centroids <- sf::st_centroid(ca)
leaflet() |>
    fitBounds(-122.6, 32.8, -117.6, 38.8) |> 
    #base groups
    addTiles(group = "OSM (default)") |>
    #layer groups
    addPolygons(data = sf::st_union(ca),
            group = "State") |>
    addPolygons(data = ca,
            group = "Counties") |>
    addMarkers(lat = unlist(lapply(centroids$geometry, "[[", 2)),
           lng = unlist(lapply(centroids$geometry, "[[", 1)),
           group = "Centroids"
    ) |>
    #layer control
    addLayersControl(
        overlayGroups = c("State", "Counties", "Centroids"),
        position = "topright",
        options = layersControlOptions(collapsed = FALSE)
    ) |>
    # hide
    hideGroup(c("State", "Counties", "Centroids"))

Merge

Maps are commonly used as choropleths, i.e. the polygon is colored based on another variable. Thus, spatial data are often merged with other data. Here, the USDA compiles some data by county in the US. The data are winnowed to identification variables and a single numeric named “unemployment_rate_2020”. Then the USDA data is merged with the California counties’ data.

file <- 'https://www.ers.usda.gov/webdocs/DataFiles/48747/Unemployment.xlsx'
usda <- rio::import(
        file = file,
        skip = 4
) |> 
        dplyr::rename_all(~janitor::make_clean_names(.x)) |> 
        dplyr::select(fips_code:area_name, unemployment_rate_2020) |> 
        dplyr::filter(state == "CA") |> 
        tidyr::separate(fips_code, into = c("statefp", "countyfp"), sep = 2) |> 
        dplyr::select(statefp, countyfp, unemployment_rate_2020)

The files’ order matters when performing the merge. By putting the “ca” simple feature class dataframe first, the subsequent “usda” dataframe morphs into the same simple feature class and is now combined. See Professor Lovelace’s Geospatial in R 3.2.4 vector attribute joining for more details. “The most common type of attribute join on spatial data takes an sf object as the first argument and adds columns to it from a data.frame specified as the second argument.”[2]

ca_choropleth <-
        dplyr::left_join(ca, usda, by = c("statefp", "countyfp"))

Choropleth Map

California counties are shaded by the 2020 unemployment rate.

Acknowledgements

This blog post was made possible thanks to:

Conclusion

This tutorial demonstrated how to: (1) read in, transform, and minimize a shapefile; (2) select a single state’s counties from the Census Bureau cartographic boundaries file; (3) plot a shapefile using the simple plot() function; (4) dissolve internal boundaries of a state using the dplyr::group_by() function and the sf::st_union() function; (5) calculate the geographic centers of polygons with the sf::st_centroid() function; (6) use the layers feature in leaflet to place multiple geographic datasets onto a map; (7) merge a dataframe/tibble/datatable with a simple feature collection; and (8) create a choropleth map in leaflet using a divergent color palette from RColorBrewer.

References

[1]
R. S. Bivand, “Progress in the R ecosystem for representing and handling spatial data,” Journal of Geographical Systems, vol. 23, no. 4, pp. 515–546, 2021.
[2]
R. Lovelace, J. Nowosad, and J. Muenchow, Geocomputation with R. Chapman and Hall/CRC, 2019.
[3]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2021 [Online]. Available: https://www.R-project.org/
[4]
Y. Xie, C. Dervieux, and A. Presmanes Hill, Blogdown: Create blogs and websites with r markdown. 2022 [Online]. Available: https://CRAN.R-project.org/package=blogdown
[5]
S. Firke, Janitor: Simple tools for examining and cleaning dirty data. 2021 [Online]. Available: https://github.com/sfirke/janitor
[6]
J. Cheng, B. Karambelkar, and Y. Xie, Leaflet: Create interactive web maps with the JavaScript leaflet library. 2022 [Online]. Available: https://rstudio.github.io/leaflet/
[7]
E. Neuwirth, RColorBrewer: ColorBrewer palettes. 2014 [Online]. Available: https://CRAN.R-project.org/package=RColorBrewer
[8]
C. Chan and T. J. Leeper, Rio: A swiss-army knife for data i/o. 2021 [Online]. Available: https://github.com/leeper/rio
[9]
E. Pebesma, Sf: Simple features for r. 2022 [Online]. Available: https://CRAN.R-project.org/package=sf

Disclaimer

The views, analysis, and conclusions presented within this post represent the author alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies, and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.0 (2021-05-18)
 os       macOS Big Sur 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2022-03-28
 pandoc   2.14.1 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version    date (UTC) lib source
 assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 blogdown     * 1.8        2022-02-16 [1] CRAN (R 4.1.2)
 bookdown       0.25       2022-03-16 [1] CRAN (R 4.1.2)
 brio           1.1.3      2021-11-30 [1] CRAN (R 4.1.0)
 bslib          0.3.1.9000 2022-03-04 [1] Github (rstudio/bslib@888fbe0)
 cachem         1.0.6      2021-08-19 [1] CRAN (R 4.1.0)
 callr          3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.1.0)
 class          7.3-20     2022-01-13 [1] CRAN (R 4.1.2)
 classInt       0.4-3      2020-04-07 [1] CRAN (R 4.1.0)
 cli            3.2.0      2022-02-14 [1] CRAN (R 4.1.2)
 colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.1.2)
 crayon         1.5.1      2022-03-26 [1] CRAN (R 4.1.0)
 crosstalk      1.2.0      2021-11-04 [1] CRAN (R 4.1.0)
 curl           4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
 data.table     1.14.2     2021-09-27 [1] CRAN (R 4.1.0)
 DBI            1.1.2      2021-12-20 [1] CRAN (R 4.1.0)
 desc           1.4.1      2022-03-06 [1] CRAN (R 4.1.2)
 devtools     * 2.4.3      2021-11-30 [1] CRAN (R 4.1.0)
 digest         0.6.29     2021-12-01 [1] CRAN (R 4.1.0)
 dplyr          1.0.8      2022-02-08 [1] CRAN (R 4.1.2)
 e1071          1.7-9      2021-09-16 [1] CRAN (R 4.1.0)
 ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 evaluate       0.15       2022-02-18 [1] CRAN (R 4.1.2)
 fansi          1.0.3      2022-03-24 [1] CRAN (R 4.1.2)
 farver         2.1.0      2021-02-28 [1] CRAN (R 4.1.0)
 fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 forcats        0.5.1      2021-01-27 [1] CRAN (R 4.1.0)
 foreign        0.8-82     2022-01-13 [1] CRAN (R 4.1.2)
 fs             1.5.2      2021-12-08 [1] CRAN (R 4.1.0)
 generics       0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
 glue           1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
 haven          2.4.3      2021-08-04 [1] CRAN (R 4.1.0)
 highr          0.9        2021-04-16 [1] CRAN (R 4.1.0)
 hms            1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
 htmltools      0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
 htmlwidgets    1.5.4      2021-09-08 [1] CRAN (R 4.1.0)
 janitor      * 2.1.0      2021-01-05 [1] CRAN (R 4.1.0)
 jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.1.0)
 jsonlite       1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
 KernSmooth     2.23-20    2021-05-03 [1] CRAN (R 4.1.0)
 knitr          1.38       2022-03-25 [1] CRAN (R 4.1.0)
 leaflet      * 2.1.1      2022-03-23 [1] CRAN (R 4.1.2)
 lifecycle      1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
 lubridate      1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
 magrittr       2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
 memoise        2.0.1      2021-11-26 [1] CRAN (R 4.1.0)
 munsell        0.5.0.9000 2021-10-19 [1] Github (cwickham/munsell@e539541)
 openxlsx       4.2.5      2021-12-14 [1] CRAN (R 4.1.0)
 pillar         1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
 pkgbuild       1.3.1      2021-12-20 [1] CRAN (R 4.1.0)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload        1.2.4      2021-11-30 [1] CRAN (R 4.1.0)
 prettyunits    1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx       3.5.3      2022-03-25 [1] CRAN (R 4.1.0)
 proxy          0.4-26     2021-06-07 [1] CRAN (R 4.1.0)
 ps             1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr          0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R6             2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
 RColorBrewer * 1.1-2      2014-12-07 [1] CRAN (R 4.1.0)
 Rcpp           1.0.8.3    2022-03-17 [1] CRAN (R 4.1.2)
 readxl         1.3.1      2019-03-13 [1] CRAN (R 4.1.0)
 remotes        2.4.2      2021-11-30 [1] CRAN (R 4.1.0)
 rio          * 0.5.29     2021-11-22 [1] CRAN (R 4.1.0)
 rlang          1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
 rmarkdown      2.13       2022-03-10 [1] CRAN (R 4.1.2)
 rprojroot      2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.1.0)
 s2             1.0.7      2021-09-28 [1] CRAN (R 4.1.0)
 sass           0.4.1      2022-03-23 [1] CRAN (R 4.1.2)
 scales         1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.1.0)
 sf           * 1.0-7      2022-03-07 [1] CRAN (R 4.1.2)
 snakecase      0.11.0     2019-05-25 [1] CRAN (R 4.1.0)
 stringi        1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
 stringr        1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
 testthat       3.1.2      2022-01-20 [1] CRAN (R 4.1.2)
 tibble         3.1.6      2021-11-07 [1] CRAN (R 4.1.0)
 tidyselect     1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
 units          0.8-0      2022-02-05 [1] CRAN (R 4.1.2)
 usethis      * 2.1.5      2021-12-09 [1] CRAN (R 4.1.0)
 utf8           1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs          0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 withr          2.5.0      2022-03-03 [1] CRAN (R 4.1.0)
 wk             0.6.0      2022-01-03 [1] CRAN (R 4.1.2)
 xfun           0.30       2022-03-02 [1] CRAN (R 4.1.2)
 yaml           2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
 zip            2.2.0      2021-05-31 [1] CRAN (R 4.1.0)

 [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────