After about four years (where did that time go?), I have circled back to Neo4j and H3 geospatial data processing.
Since version 3.4, Neo4j has a native geospatial datatype. Neo4j uses the WGS-84 and WGS-84 3D coordinate reference system. Within Neo4j, we can index these point properties and query using our distance function or you query within a bounding box.
In the next few posts, we’ll discuss the new plugin and walk through some examples of how to use the plugin within Neo4j for geospatial usecases.
What is H3
From the H3Geo.org website, “H3 is a geospatial indexing system that partitions the world into hexagonal cells. H3 is open source under the Apache 2 license.
The H3 Core Library implements the H3 grid system. It includes functions for converting from latitude and longitude coordinates to the containing H3 cell, finding the center of H3 cells, finding the boundary geometry of H3 cells, finding neighbors of H3 cells, and more.”
H3 leverages the power of hexagons to create a global grid system with 16 resolutions, from 0 (the size of a small continent) to 15 (less than a square meter). While the H3 library itself doesn’t perform geospatial analysis, it provides the core building blocks for a range of analytical functions. One of the most common uses is to bin data such as places or events that occur with a specific location. Once the data is binned, we can then compare the count or sum of occurrences in that hex location with data in other bins at the same resolution.
The H3 library helps us make sense of large sets of data. It allows the user to efficiently work with the data through optimized functions such as indexing points to cells, traversing through the hexes and by effectively moving through a hierarchy of hex cells. These functions fit well with Neo4j due to the relationships among the cells.
H3 cell IDs are also perfect for joining disparate datasets. That is, you can perform a spatial join easily without complex geospatial functions. It is straightforward to join datasets by cell ID and start answering location-driven questions. In this blog series, we are going to join Points of Interest data with NYC Neighborhood definitions with NYC Taxi Pickup Zones and with the NYC Taxi dataset.
Databricks and H3
You might be wondering why we are talking about Databricks in this blogpost. On September 14, 2022, Databricks announced built-in H3 expressions for “efficient geospatial processing and analytics”. They announced 28 built-in functions for working with geospatial data. I had recently seen that posting and decided to adopt the Databricks naming convention for the Neo4j plugin.
The Neo4jH3 library is available here. Besides the README.md for installation instructions, the Documentation is a great resource for the different functions and procedures. I’ve tried my best to provide sufficient examples along with error codes for the functions and procedures.
For this blog series, we will use the 5.3.0 release as it has the new function/procedure naming and works with Neo4j 5.
If you have questions at any point in this blog series, leave me a comment and I’ll be glad to get back to you.