Geospatial Pre-filtering for POI Recommendations

Background

Place / Point of Interest (POI) recommendations provide locations that are of interest to a person searching for an interesting place to see within a geographic area. Recent papers have been published to identify different approaches to address this problem. One paper looks at Graph Neural Networks. A second paper looks at using Large Language Models (LLMs) and Gen-AI to parse a query and pre-filter on a geographic region before searching across the list of POIs.

This blog post series is not focused on LLMs or GenAI. Rather we will look at leveraging vector indexes to do some pre-filtering on a region and location category to quickly retrieve results. The POI data and geospatial data will be stored in a graph database as a Knowledge Graph with embeddings created for the POI name, category name and the geographic location name. This blog series will walk through the approach and discuss pros and cons of the approach.

Neo4j writes “Graph databases like Neo4j are built on the concept of a graph: a collection of nodes and relationships. Nodes represent individual data points, while relationships define the connections between them. Each node can possess properties, which are key-value pairs providing additional context or attributes about the node. This approach offers a flexible and intuitive way to model complex relationships and dependencies within data.” We will model our data as two Knowledge Graphs. We will search across the graphs leveraging the geospatial reference data to search the POI data.

Vector Indexing

Neo4j supports using vector indexes to perform fast approximate k-nearest neighbor (KNN) searches using either cosine or euclidean similarity. Neo4j’s vector indexing capability uses an algorithm called Hierarchical Navigable Small World (HNSW) to identify similar vectors efficiently. The more similar the vectors, the higher the relevance. In our example, embeddings will be created on node properties, vector indexes created and then the vector indexes queried to identify results.

Places Of Interest Data

For this investigation, we chose to use the Foursquare Open Source Places (FSQ OS Places) dataset. This dataset was recently open-sourced under the Apache 2.0 license framework. FSQ OS Places is 100mm+ global places of interest (“POI”) that includes 22 core attributes (see schema here) that will be updated monthly. The background on open sourcing FSQ OS Places is here. Release notes for the December 2024 release are available as well. Our research will be using the November 2024 data set.

Architecture

For this investigation, we are using a relatively simple architecture as shown below.

The next post will focus on the data, data modeling and data sources that we used for this work.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.