--- title: "geohashTools" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{geohashTools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, eval = TRUE, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = '#>', warning = FALSE, message = FALSE ) ``` # What is a Geohash? Developed by [Gustavo](https://github.com/niemeyer) [Niemeyer](https://twitter.com/gniemeyer), the [geohash](https://en.wikipedia.org/wiki/Geohash) is system of nestable, compact global coordinates based on [Z-order curves](https://en.wikipedia.org/wiki/Z-order_curve). The system consists of carving the earth into equally-sized rectangles (when projected into latitude/longitude space) and nesting this process recursively. Geohashes are a grid-like hierarchical spatial indexing system. The precision of a geohash is dictated by the length of the character string that encodes it. The geohash system is useful for indexing or aggregating point data with latitude and longitude coordinates. Each geohash of a given length uniquely identifies a section of the globe. ### Precision of geohashes ```{r precision_table, message=FALSE, warning=FALSE, echo=FALSE} data.frame( `Geohash Length` = 1:8, `KM error`= c(2500.0, 630.0, 78.0, 20.0, 2.4, 0.61, 0.076, 0.019) ) ``` ## Encoding geohashes Encoding is the process of turning latitude/longitude coordinates into geohash strings. For example, Parque Nacional Tayrona in Colombia is located at roughly 11.3113917 degrees of latitude, -74.0779006 degrees of longitude. This can be expressed more compactly as: ```{r tayrona} library(geohashTools) gh_encode(11.3113917, -74.0779006) ``` These 6 characters identify this point on the globe to within 1.2 kilometers (east-west) and .6 kilometers (north-south). The park is quite large, and this is too precise to cover the park; we can "zoom out" by reducing the precision (which is the number of characters in the output, `6` by default): ```{r tayrona_zoom_out} gh_encode(11.3113917, -74.0779006, precision = 5L) ``` ### Example: Encoding many points We can use this as a simple, regular level of spatial aggregation for spatial points data. Here with randomly-selected coordinates: ```{r} coords = data.frame( x=rnorm(20L), y=rnorm(20L) ) gh <- gh_encode(coords$x, coords$y) gh ``` ## Decoding geohashes The reverse of encoding geohashes is of course decoding them -- taking a given geohash string and converting it into global coordinates. For example, the Ethiopian coffee growing region of Yirgacheffe is roughly at `sc54v`: ```{r yirgacheffe} gh_decode('sc54v') ``` It can also be helpful to know just how precisely we've identified these coordinates; the `include_delta` argument gives the cell half-widths in both directions in addition to the cell centroid: ```{r yirgacheffe_delta} gh_decode('sc54v', include_delta = TRUE) ``` For more detail on precision, see the table earlier on this vignette which shows the approximate level potential delta at different precision levels. In terms of latitude and longitude, all geohashes with the same precision have the same dimensions (though the physical size of the "rectangle" changes depending on the latitude); as such it's easy to figure out thecell half-widths from the precision alone using `gh_delta`: ```{r gh_delta} gh_delta(5L) ``` You can also pass entire vectors into `gh_decode` to decode multiple geohashes at once. ```{r decode_vector} gh_decode(gh) ``` ## Geohash neighborhoods One unfortunate consequence of the geohash system is that, while geohashes that are lexicographically similar (e.g. `wxyz01` and `wxyz12`) are certainly close to one another, the converse is not true -- for example, `7gxyru` and `k58n2h` are neighbors! Put another way, small movements on the globe occasionally have visually huge jumps in the geohash-encoded output. The `gh_neighbors` function is designed to address this. Calling `gh_neighbours` will return all of the geohashes adjacent to a given geohash (or vector of geohashes) at the same level of precision. For example, the Merlion statue in Singapore is roughly at `w21z74nz`, but this level of precision zooms in a bit too far. The geohash neighborhood thereof can be found with: ```{r neighbors} gh_neighbors('w21z74nz') ``` ## Working with spatial formats There are several `geohashTools` helper functions for converting geohashes into `sp` and `sf` class objects. ### gh_to_* The `gh_to_sp`, `gh_to_spdf` and `gh_to_sf` functions convert geohash or vector of geohashes into spatial objects of `sp` and `sf` class respectively. Consider the previous example with the Singapore Merlion and some random data. ```{r merlion_nbhd, fig.width = 3, fig.height = 3, out.width = '\\textwidth'} library(sf) merlion_ghs <- gh_neighbors('w21z74') merlion_nbhd <- gh_to_sf(merlion_ghs) # Example plot of geohashes neighbouring w21z74 plot(merlion_nbhd, col = NA, reset = FALSE, key.pos = NULL) text( st_coordinates(st_centroid(merlion_nbhd)), labels = row.names(merlion_nbhd) ) ``` ### gh_covering Sometimes we have a set of spatial datapoints that we want to aggregate or index using geohashes. The `gh_covering` function produces a grid of geohashes that overlap with the spatial points. For this function to work, the spatial object has to be in the [WGS84 (EPSG 4326)](https://epsg.io/4326) coordinate reference system that the geohashes use. Let's use the included `meuse` dataset in the `sp` package which shows point coordinates for metal deposits. The first element in this object shows values for cadmium deposits. ```{r meuse, fig.width = 4, fig.height = 4, out.width = '\\textwidth'} if (!requireNamespace('ggplot2', quietly = TRUE)) { install.packages('ggplot2') } library(ggplot2) data(meuse, package = 'sp') meuse_sf = st_as_sf(meuse, coords = c('x', 'y'), crs = 28992L, agr = 'constant') meuse_sf <- st_transform(meuse_sf, crs = 4326L) ggplot() + geom_sf(data = meuse_sf, aes(colour = cadmium)) + theme_void() ``` By default, `gh_covering` creates a grid that covers the extent of the bounding box for the spatial object. ```{r meuse_covering} meuse_gh <- gh_covering(meuse_sf) meuse_gh ``` We can visualize what this looks like with geohashes overlayed on top. ```{r meuse_covering_plot, fig.width = 4, fig.height = 4, out.width = '\\textwidth'} ggplot() + geom_sf(data = meuse_sf) + geom_sf(data = meuse_gh, fill = NA) + geom_sf_text(data = meuse_gh, aes(label = rownames(meuse_gh))) + theme_void() ``` Alternatively, using `gh_covering` with the parameter `minimal = TRUE` will only create geohashes for intersecting objects. ```{r meuse_minimal, fig.width = 4, fig.height = 4, out.width = '\\textwidth'} meuse_gh <- gh_covering(meuse_sf, minimal = TRUE) ggplot() + geom_sf(data = meuse_sf) + geom_sf(data = meuse_gh, fill = NA) + geom_sf_text(data = meuse_gh, aes(label = rownames(meuse_gh))) + theme_void() ```