class: center, middle, inverse, title-slide # Geospatial Data in R ## A 10,000ft View ### Matt Hellmann ### 2019-08-27 --- # Packages -- - sf - simple features spatial data frames -- - mapview & mapedit - interactive mapping -- - mapdeck - 2.5d maps using Mapbox & Deck.gl --- class: inverse, center, middle # sf --- # sf - simple features -- - Works like a dataframe (or tibble) -- - Additional 'sticky' $geometry list-column with POINTS, LINES, POLYGONS... (and a couple of others) -- - Geographic datum -- # Why sf? -- - Data structure we're all used to working with -- - Works well with the tidyverse ... filter, joins, mutate, ... -- - Makes spatial operations easy. --- # Reading shapefiles Reading pre-packaged shapefiles: ```r list.files('data/hcpafl/') ``` ``` ## [1] "parcel_08_23_2019.zip" "parcel_dor_names.dbf" ## [3] "parcel_sub_names.dbf" "parcel.cpg" ## [5] "parcel.dbf" "parcel.prj" ## [7] "parcel.sbn" "parcel.sbx" ## [9] "parcel.shp" "parcel.shp.xml" ## [11] "parcel.shx" ``` -- Is easy: ```r tpa_parcels <- sf::read_sf('data/hcpafl/parcel.shp') ``` .footnote[parcel data from hcpafl.org] --- ```r tpa_parcels[1:6,] ``` ``` ## Simple feature collection with 6 features and 47 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 408532.4 ymin: 1179025 xmax: 478385.6 ymax: 1395939 ## epsg (SRID): NA ## proj4string: +proj=tmerc +lat_0=24.33333333333333 +lon_0=-82 +k=0.9999411764705882 +x_0=199999.9999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs ## # A tibble: 6 x 48 ## FOLIO TYPE PIN DOR_C OWNER ADDR_1 ADDR_2 CITY STATE ZIP COUNTRY ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 <NA> dummy <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> ## 2 0000… <NA> A-23… 8800 UNIT… EGMON… 4905 … SAIN… FL 3371… <NA> ## 3 0000… <NA> A-23… 8600 HILL… REAL … PO BO… TAMPA FL 3360… <NA> ## 4 0000… <NA> U-01… 0000 STEP… 19859… <NA> ODES… FL 3355… <NA> ## 5 0000… <NA> U-01… 0100 JEFF… 19859… <NA> ODES… FL 3355… <NA> ## 6 0000… <NA> U-01… 0100 MARI… 19901… <NA> ODES… FL 3355… <NA> ## # … with 37 more variables: SUB <chr>, SITE_ADDR <chr>, SITE_CITY <chr>, ## # SITE_ZIP <chr>, LEGAL1 <chr>, LEGAL2 <chr>, LEGAL3 <chr>, ## # LEGAL4 <chr>, DBA <chr>, STRAP <chr>, tBEDS <dbl>, tBATHS <dbl>, ## # tSTORIES <dbl>, tUNITS <dbl>, tBLDGS <dbl>, TAXDIST <chr>, JUST <dbl>, ## # LAND <dbl>, BLDG <dbl>, EXF <dbl>, ACT <int>, EFF <int>, ## # HEAT_AR <dbl>, ASD_VAL <dbl>, TAX_VAL <dbl>, MUNI <chr>, SD1 <chr>, ## # SD2 <chr>, TIF <chr>, BASE <int>, S_DATE <date>, VI <chr>, ## # S_AMT <dbl>, ACREAGE <dbl>, NBHC <dbl>, Edit_dt <date>, ## # geometry <MULTIPOLYGON [US_survey_foot]> ``` --- ### Too much data, filter & select ```r tpa_33603 <- tpa_parcels %>% filter(SITE_ZIP == '33603') %>% select(PIN, HEAT_AR, DOR_C) tpa_33603 ``` ``` ## Simple feature collection with 7696 features and 3 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 499071.9 ymin: 1319071 xmax: 520754.9 ymax: 1333718 ## epsg (SRID): NA ## proj4string: +proj=tmerc +lat_0=24.33333333333333 +lon_0=-82 +k=0.9999411764705882 +x_0=199999.9999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs ## # A tibble: 7,696 x 4 ## PIN HEAT_AR DOR_C geometry ## * <chr> <dbl> <chr> <MULTIPOLYGON [US_survey_foot]> ## 1 A-34-28-18-ZZZ… 2907 7100 (((499571.2 1332941, 499495.7 1332941, 49… ## 2 A-34-28-18-ZZZ… 1976 7100 (((499681.5 1333150, 499680.5 1332994, 49… ## 3 A-34-28-18-ZZZ… 9000 1630 (((499813.5 1333149, 499812.2 1332938, 49… ## 4 A-34-28-18-3GF… 17626 1630 (((499764.7 1331695, 499525.8 1331695, 49… ## 5 A-34-28-18-3GF… 2940 1410 (((499805.7 1331975, 499804.9 1331899, 49… ## 6 A-34-28-18-3GF… 1750 1730 (((499806.2 1332024, 499805.7 1331975, 49… ## 7 A-34-28-18-3GF… 1000 1730 (((499807.4 1332055, 499806.2 1332024, 49… ## 8 A-34-28-18-3GF… 2720 4870 (((499807.4 1332055, 499697.2 1332055, 49… ## 9 A-34-28-18-3GF… 1208 2104 (((499698.4 1332176, 499698.5 1332228, 49… ## 10 A-34-28-18-3GF… 4325 4840 (((499812.6 1332302, 499812.1 1332228, 49… ## # … with 7,686 more rows ``` --- ## Unique to sf objects ```r st_geometry(tpa_33603) ``` ``` ## Geometry set for 7696 features ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 499071.9 ymin: 1319071 xmax: 520754.9 ymax: 1333718 ## epsg (SRID): NA ## proj4string: +proj=tmerc +lat_0=24.33333333333333 +lon_0=-82 +k=0.9999411764705882 +x_0=199999.9999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs ## First 5 geometries: ``` ``` ## MULTIPOLYGON (((499571.2 1332941, 499495.7 1332... ``` ``` ## MULTIPOLYGON (((499681.5 1333150, 499680.5 1332... ``` ``` ## MULTIPOLYGON (((499813.5 1333149, 499812.2 1332... ``` ``` ## MULTIPOLYGON (((499764.7 1331695, 499525.8 1331... ``` ``` ## MULTIPOLYGON (((499805.7 1331975, 499804.9 1331... ``` --- ## sf & ggplot2 ```r ggplot(tpa_33603) + geom_sf(aes(fill = HEAT_AR)) ``` <!-- --> --- # sf objects from gps (*.gpx) ```r bike_spatial <- read_sf('data/bike_data/2018 Q1/trips_coast_bike_share_01_01_2018-03_31_2018.gpx', layer = 'tracks') ``` ```r bike_spatial %>% select(name, cmt, type) %>% head() ``` ``` ## Simple feature collection with 6 features and 3 fields ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: -82.64669 ymin: 27.76822 xmax: -82.45495 ymax: 27.95079 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 6 x 4 ## name cmt type geometry ## <chr> <chr> <chr> <MULTILINESTRING [°]> ## 1 a3127db901743… 2018-01-01… <NA> ((-82.64669 27.76822, -82.64669 27.7682… ## 2 eef7e4fff5f1a… 2018-01-01… <NA> ((-82.63412 27.77464, -82.63411 27.7746… ## 3 ed943d754ef9b… 2018-01-01… <NA> ((-82.63423 27.7747, -82.63417 27.77471… ## 4 570e802090654… 2018-01-01… <NA> ((-82.6315 27.77684, -82.6315 27.77684,… ## 5 41ec36e114e1c… 2018-01-01… <NA> ((-82.63159 27.77691, -82.63159 27.7769… ## 6 0d58a877d5790… 2018-01-01… <NA> ((-82.45495 27.94124, -82.45495 27.9412… ``` --- ### Not much there. ### Look at associated .csv for more data. ```r bike_csv <- read_csv('data/bike_data/2018 Q1/trips_coast_bike_share_01_01_2018-03_31_2018.csv') names(bike_csv) ``` ``` ## [1] "Route ID" "Start Hub" "Start Latitude" ## [4] "Start Longitude" "Start Time" "End Hub" ## [7] "End Latitude" "End Longitude" "End Time" ## [10] "Bike ID" "Distance [Miles]" "Duration" ## [13] "Rental Access Path" "Overtime Y/N" "Out Of Hub Y/N" ## [16] "Out Of Area Y/N" ``` --- ### clean some things up & join the files ```r bike_spatial[,apply(X = bike_spatial, MARGIN = 2, FUN = function(x) sum(is.na(x))) > dim(bike_spatial)[1] -1] <- NULL names(bike_spatial)[1] <- names(bike_csv)[1] ## Remove the problem observations from the bike_spatial_2018q1 bike_spatial <- bike_spatial[-c(which(st_is_empty(bike_spatial))),] # Join the two bikes <- left_join(bike_spatial, bike_csv) %>% janitor::clean_names() ``` ``` ## Joining, by = "Route ID" ``` --- ### quick look ```r head(bikes) ``` ``` ## Simple feature collection with 6 features and 18 fields ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: -82.64669 ymin: 27.76822 xmax: -82.45495 ymax: 27.95079 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 6 x 19 ## route_id cmt desc geometry start_hub start_latitude ## <chr> <chr> <chr> <MULTILINESTRING [°]> <chr> <chr> ## 1 a3127db… 2018… 858c… ((-82.64669 27.76822, -8… SP - MLK… 27.76822333 ## 2 eef7e4f… 2018… 0e0b… ((-82.63412 27.77464, -8… SP - 1st… 27.77464167 ## 3 ed943d7… 2018… 0e0b… ((-82.63423 27.7747, -82… SP - 1st… 27.77470167 ## 4 570e802… 2018… 2f04… ((-82.6315 27.77684, -82… SP - Bea… 27.77684167 ## 5 41ec36e… 2018… 2f04… ((-82.63159 27.77691, -8… <NA> 27.77691167 ## 6 0d58a87… 2018… 4473… ((-82.45495 27.94124, -8… TPA - S … 27.941245 ## # … with 13 more variables: start_longitude <chr>, start_time <chr>, ## # end_hub <chr>, end_latitude <chr>, end_longitude <chr>, ## # end_time <chr>, bike_id <dbl>, distance_miles <dbl>, duration <chr>, ## # rental_access_path <chr>, overtime_y_n <chr>, out_of_hub_y_n <chr>, ## # `Out Of Area Y/N` <chr> ``` --- ## mapview -- ```r library(mapview) bikes %>% filter(str_detect(start_hub, 'TPA -')) %>% sample_n(100) %>% mapview(alpha = .2, lwd = .5) ```
--- class: inverse, center, middle # mapdeck --- ## mapdeck R interface to Mapbox GL & Deck.gl. Iinteractive spatial mapping with 2.5D visuals. - **Grids** - Points - **Arcs** - Lines - Paths - Polygons - **Screengrid** - hexagons *Requires Mapbox account and API key*. .footnote[more @ symbolixau.github.io] --- ## mapdeck examles Some mapdeck features require particular geometry types. Change geometry from 'MULTILINESTRING' to 'MULTIPOINT' with st_cast(). ```r bikes <- bikes %>% st_cast('MULTIPOINT') ``` --- ### grid ```r mapdeck( token = key, style = mapdeck_style('dark'), pitch = 45 ) %>% mapdeck(token = key, style = mapdeck_style('dark'), pitch = 45) %>% add_grid(data = bikes_tpa_multipoint[1:3000,], cell_size = 20) ``` [In Browser](http://mrhellmann.com/tampa_r_pres/bike_grid.html) --- ### screengrid Mapdeck using an sf object ```r mapdeck( token = key, style = mapdeck_style('dark'), pitch = 45 ) %>% add_screengrid(data = bikes[1:11000,], cell_size = 10, opacity = .4) ``` [In Browser](http://mrhellmann.com/tampa_r_pres/screengrid.html) -- --- ###arcs Mapdeck using non-spatial tibble: ```r arcs <- bike_csv %>% group_by(`Start Hub`, `End Hub`) %>% summarise( startlat = mean(as.numeric(`Start Latitude`), na.rm = T), endlat = mean(as.numeric(`End Latitude`), na.rm = T), startlon = mean(as.numeric(`Start Longitude`), na.rm = T), endlon = mean(as.numeric(`End Longitude`), na.rm = T), n = n()) %>% filter(!is.na(`Start Hub`), !is.na(`End Hub`)) %>% filter(str_detect(`Start Hub`, 'TPA - ')) %>% mutate(height = n/10) %>% janitor::clean_names() ) ``` [In Browser](http://mrhellmann.com/tampa_r_pres/arcs.html) --- ## Mapedit Defining spatial dataframes by pointing & clicking ```r sobrew <- mapedit::editMap() ``` -- - Opens in RStudio, allowing you to create points, lines, and polygons by clicking a map. - Returns an sf object. - This object created by clicking on a map in RStudio. -- ```r sobrew <- readRDS('data/sobrew.R') sobrew ``` ``` ## Simple feature collection with 1 feature and 2 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -82.45142 ymin: 27.98624 xmax: -82.45142 ymax: 27.98624 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## X_leaflet_id feature_type geometry ## 1 593 marker POINT (-82.45142 27.98624) ``` --- # Some spatial operations What's the closest a bike came to Southern Brewing? Find the closest bike: ```r st_nearest_feature(sobrew, bikes_tpa) ``` ``` ## although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar ``` ``` ## [1] 14015 ``` -- Get the distance: ```r st_distance(sobrew, bikes_tpa[14015,]) ``` ``` ## Units: [m] ## [,1] ## [1,] 277.5064 ``` --- ## Map 'em ```r mapview(sobrew, color='red') %>% addFeatures(bikes_tpa[14015,]) ```
--- ## Changing coordinate reference systems (CRS) Lat & Lon projections sometimes aren't enough for geographic operations. Transforming the object with st_transform() fixes that. ```r sobrew_6439 <- st_transform(sobrew, 6439) st_crs(sobrew_6439) ``` ``` ## Coordinate Reference System: ## EPSG: 6439 ## proj4string: "+proj=aea +lat_1=24 +lat_2=31.5 +lat_0=24 +lon_0=-84 +x_0=400000 +y_0=0 +ellps=GRS80 +units=m +no_defs" ``` ```r st_crs(sobrew) ``` ``` ## Coordinate Reference System: ## EPSG: 4326 ## proj4string: "+proj=longlat +datum=WGS84 +no_defs" ``` --- ## And changing the CRS for the housing data to match: ```r st_crs(tpa_33603) ``` ``` ## Coordinate Reference System: ## No EPSG code ## proj4string: "+proj=tmerc +lat_0=24.33333333333333 +lon_0=-82 +k=0.9999411764705882 +x_0=199999.9999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs" ``` ```r tpa_33603 <- st_transform(tpa_33603, st_crs(sobrew_6439)) st_crs(tpa_33603) ``` ``` ## Coordinate Reference System: ## EPSG: 6439 ## proj4string: "+proj=aea +lat_1=24 +lat_2=31.5 +lat_0=24 +lon_0=-84 +x_0=400000 +y_0=0 +ellps=GRS80 +units=m +no_defs" ``` --- ## Finding housing parcels within a radius: Make a buffer of 1000 meters *(units are intrinsic to the projection chosen)* ```r sobrew_buff <- st_buffer(sobrew_6439, 1000) ``` Find the intersection with earlier tpa_33603 data that we transformed: ```r st_intersects(sobrew_buff, tpa_33603, sparse = FALSE) ``` Use it to subset the data: ```r tpa_33603[st_intersects(sobrew_buff, tpa_33603, sparse = FALSE),] ``` --- ## And finally throw that at mapview: ```r mapview(tpa_33603[st_intersects(sobrew_buff, tpa_33603, sparse = FALSE),]) ```