Summary
This post is an introduction to working with thesf
package. “sf” is short for “simple features” and is an open source standard to interact with vector geometries. Downloads of the sf
package now exceed other R packages dealing with spatial data.
Table of Contents
Introduction
This post is an introduction to working with the sf
package. “sf” is short for “simple features” and is an open-source model and standard to interact with vector geometries. Downloads of the sf
package now exceed other R
packages dealing with spatial data. And going forward, sf
is likely to become the primary package in spatial analysis.
According to Roger Bivand, an early pioneer in the use of R for spatial analysis, “the transition to sf
from sp
, rgdal
, and rgeos
should be unproblematic. Shifting to new visualization packages like tmap
, cartography
and mapview
should also be relatively easy, and use of sf
and the new visualization packages should certainly become standard for new research and teaching.”[1]
The simple features package was first released in January 2017. Its design and philosophy mirror other packages included in the “tidyverse”. Thus, manipulating data in sf
classed objects is similar to how it would be accomplished in dplyr
.
Robin Lovelace is an authority in the field who has been immensely active in geospatial development for R. He is a contributor to the sf
package. His book, “Geocomputation with R” is recommended. My first exposure to his work was a tutorial that used a previous package and his work was consulted every time I had to do a merge/join.
This post extracts portions of his book that are most commonly performed in everyday spatial analysis. An excellent cheat sheet is also available. Two sources of data are used: the US Census Bureau cartographic boundaries for counties and USDA county unemployment files.
US Census Bureau
The county geospatial data come from the US Census.
#read in file
library(sf)
#all counties
counties <-
sf::st_read(
dsn = "cb_2020_us_county_20m",
#do not put file extentsion .shp on end!
layer = "cb_2020_us_county_20m"
) |>
# convert coordinates to simple feature
sf::st_transform('+proj=longlat +datum=WGS84') |>
# reduce file size
sf::st_simplify() |>
# convert variables to lowercase
dplyr::rename_all(~janitor::make_clean_names(.x))
Reading layer `cb_2020_us_county_20m' from data source
`/Users/robwiederstein/Dropbox/coding/rproj/my-hugo-blog/content/post/2022-03-28-simple-features-tutorial-in-r-number-2/cb_2020_us_county_20m'
using driver `ESRI Shapefile'
Simple feature collection with 3221 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS: NAD83
Subset to State
Filter the dataframe to the state of interest. The result can be reached using a dplyr
pipe.
#california fips is "06"
ca <-
counties |>
dplyr::filter(grepl("06", statefp))
Sanity Check
Before proceeding further, efficiency requires the collected data represents what we intended; namely, the state of California. After inspection, we appear headed on the right track.
plot(ca |> dplyr::select(countyfp))
Subset to Counties
Next, let’s choose three random California counties and plot them using Leaflet.
ca_counties <-
ca |>
dplyr::filter(name %in% c("Los Angeles", "San Francisco", "Napa"))
library(leaflet)
leaflet() |>
fitBounds(-122.6, 32.8, -117.6, 38.8) |>
addTiles() |>
addPolygons(data = ca_counties,
popup = ~name)
Group Counties by Dplyr
counties_join_by_group <-
ca |>
dplyr::mutate(number = 1) |>
dplyr::group_by(statefp) |>
dplyr::summarize(tot = sum(number))
#map
leaflet() |>
#fitBounds(-124.4, 32.5, -114.1, 42) |>
setView(-120.52462403790528, 37.74238197955183, zoom = 5) |>
addTiles() |>
addPolygons(data = counties_join_by_group)
Counties Join by Union
The other way to join geographic subdivisions together is through the use of sf::st_union()
.
# join counties into California
counties_join_by_union <- sf::st_union(ca)
# map
leaflet() %>%
fitBounds(-124.4, 32.5, -114.1, 42) |>
#base groups
addTiles(group = "OSM (default)") |>
addPolygons(
data = counties_join_by_union
)
Calculate Centroids
centroids <- sf::st_centroid(ca)
leaflet() |>
fitBounds(-122.6, 32.8, -117.6, 38.8) |>
#base groups
addTiles(group = "OSM (default)") |>
#layer groups
addPolygons(data = sf::st_union(ca),
group = "State") |>
addPolygons(data = ca,
group = "Counties") |>
addMarkers(lat = unlist(lapply(centroids$geometry, "[[", 2)),
lng = unlist(lapply(centroids$geometry, "[[", 1)),
group = "Centroids"
) |>
#layer control
addLayersControl(
overlayGroups = c("State", "Counties", "Centroids"),
position = "topright",
options = layersControlOptions(collapsed = FALSE)
) |>
# hide
hideGroup(c("State", "Counties", "Centroids"))
Merge
Maps are commonly used as choropleths, i.e. the polygon is colored based on another variable. Thus, spatial data are often merged with other data. Here, the USDA compiles some data by county in the US. The data are winnowed to identification variables and a single numeric named “unemployment_rate_2020”. Then the USDA data is merged with the California counties’ data.
file <- 'https://www.ers.usda.gov/webdocs/DataFiles/48747/Unemployment.xlsx'
usda <- rio::import(
file = file,
skip = 4
) |>
dplyr::rename_all(~janitor::make_clean_names(.x)) |>
dplyr::select(fips_code:area_name, unemployment_rate_2020) |>
dplyr::filter(state == "CA") |>
tidyr::separate(fips_code, into = c("statefp", "countyfp"), sep = 2) |>
dplyr::select(statefp, countyfp, unemployment_rate_2020)
The files’ order matters when performing the merge. By putting the “ca” simple feature class dataframe first, the subsequent “usda” dataframe morphs into the same simple feature class and is now combined. See Professor Lovelace’s Geospatial in R 3.2.4 vector attribute joining for more details. “The most common type of attribute join on spatial data takes an sf object as the first argument and adds columns to it from a data.frame specified as the second argument.”[2]
ca_choropleth <-
dplyr::left_join(ca, usda, by = c("statefp", "countyfp"))
Choropleth Map
California counties are shaded by the 2020 unemployment rate.
Conclusion
This tutorial demonstrated how to: (1) read in, transform, and minimize a shapefile; (2) select a single state’s counties from the Census Bureau cartographic boundaries file; (3) plot a shapefile using the simple plot()
function; (4) dissolve internal boundaries of a state using the dplyr::group_by()
function and the sf::st_union()
function; (5) calculate the geographic centers of polygons with the sf::st_centroid()
function; (6) use the layers feature in leaflet
to place multiple geographic datasets onto a map; (7) merge a dataframe/tibble/datatable with a simple feature collection; and (8) create a choropleth map in leaflet using a divergent color palette from RColorBrewer
.
References
Disclaimer
The views, analysis, and conclusions presented within this post represent the author alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies, and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimed as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article.
Reproducibility
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2022-03-28
pandoc 2.14.1 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
blogdown * 1.8 2022-02-16 [1] CRAN (R 4.1.2)
bookdown 0.25 2022-03-16 [1] CRAN (R 4.1.2)
brio 1.1.3 2021-11-30 [1] CRAN (R 4.1.0)
bslib 0.3.1.9000 2022-03-04 [1] Github (rstudio/bslib@888fbe0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0)
class 7.3-20 2022-01-13 [1] CRAN (R 4.1.2)
classInt 0.4-3 2020-04-07 [1] CRAN (R 4.1.0)
cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2)
colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.0)
crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.1.0)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.0)
DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.0)
desc 1.4.1 2022-03-06 [1] CRAN (R 4.1.2)
devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.1.0)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0)
dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.2)
e1071 1.7-9 2021-09-16 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.2)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
forcats 0.5.1 2021-01-27 [1] CRAN (R 4.1.0)
foreign 0.8-82 2022-01-13 [1] CRAN (R 4.1.2)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.0)
generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.0)
highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.0)
janitor * 2.1.0 2021-01-05 [1] CRAN (R 4.1.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.2)
KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.1.0)
knitr 1.38 2022-03-25 [1] CRAN (R 4.1.0)
leaflet * 2.1.1 2022-03-23 [1] CRAN (R 4.1.2)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.0)
magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.0)
munsell 0.5.0.9000 2021-10-19 [1] Github (cwickham/munsell@e539541)
openxlsx 4.2.5 2021-12-14 [1] CRAN (R 4.1.0)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.3 2022-03-25 [1] CRAN (R 4.1.0)
proxy 0.4-26 2021-06-07 [1] CRAN (R 4.1.0)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.1.0)
Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.1.2)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.0)
rio * 0.5.29 2021-11-22 [1] CRAN (R 4.1.0)
rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.2)
rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.2)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
s2 1.0.7 2021-09-28 [1] CRAN (R 4.1.0)
sass 0.4.1 2022-03-23 [1] CRAN (R 4.1.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.0)
sf * 1.0-7 2022-03-07 [1] CRAN (R 4.1.2)
snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.1.0)
stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
testthat 3.1.2 2022-01-20 [1] CRAN (R 4.1.2)
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
units 0.8-0 2022-02-05 [1] CRAN (R 4.1.2)
usethis * 2.1.5 2021-12-09 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.0)
wk 0.6.0 2022-01-03 [1] CRAN (R 4.1.2)
xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
zip 2.2.0 2021-05-31 [1] CRAN (R 4.1.0)
[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────