Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
GPS data is inherently noisy and often lacks precision, which can make it challenging to extract accurate insights. This imprecision means that the GPS points logged may not accurately represent the actual locations where a device was. For example, GPS data from a drive around a lake may incorrectly include points that are over the water!
To address these inaccuracies, teams commonly use two approaches:
WherobotsAI Map Matching offers an advanced solution for this problem that’s significantly more performant and 99% lower cost than Mapbox, Google Maps, or HERE maps alternatives. It performs map matching with high scale on millions or even billions of trips with ease and performance, ensuring that the GPS data aligns accurately with the actual paths most likely taken.
An illustration of map matching. Blue dots: GPS samples, Green line: matched trajectory.
Map matching is a common solution for preparing GPS data for use in a wide range of applications including:
The objective of map matching is to accurately determine which road or path in the digital map corresponds to the observed geographic coordinates, considering factors such as the accuracy of the location data, the density and layout of the road network, and the speed and direction of travel.
Most map matching implementations are variants of the Hidden Markov Model (HMM)-based algorithm described by Newson and Krumm in their seminal paper, “Hidden Markov Map Matching through Noise and Sparseness.” This foundational research has influenced a variety of map matching solutions available today.
However, traditional HMM-based approaches have notable downsides when working with large-scale GPS datasets:
These challenges highlight the need for more efficient and cost-effective solutions capable of handling large-scale GPS datasets without compromising on performance.
The Mapbox Map Matching API, HERE Maps Route Matching API, and Google Roads API are powerful RESTful APIs for performing map matching. These solutions are particularly effective for small-scale applications. However, for large-scale applications, such as population-level analysis involving millions of trajectories, the costs can become prohibitively high.
For example, as of July 2024, the approximate costs for matching 1 million trips are:
These prices are based on public pricing pages and do not consider any potential volume-based discounts that may be available.
While these APIs provide robust and accurate map matching capabilities, organizations seeking to perform extensive analyses often must explore more cost-effective alternatives.
Open-source software such as such as Valhalla or GraphHopper can also be used for map matching. However, these solutions are designed for use on a single-machine. If your map matching workload exceeds the capacity that machine, your workload will suffer from extended processing times. Furthermore, you will end up running out of headroom if you are vertically scaling up the ladder of VM sizes.
WherobotsAI Map Matching is a high performance, low cost, and planetary scale map matching capability for your telematics pipelines.
WherobotsAI provides a scalable map matching feature designed for small to very large scale trajectory datasets. It works seamlessly with other Wherobots capabilities, which means you can implement data cleaning, data transformations, and map matching in one single (serverless) data processing pipeline. We’ll see how it works in the following sections.
WherobotsAI Map Matching takes a DataFrame containing trajectories and another DataFrame containing road segments, and produces a DataFrame containing map matched results. Here is a walk-through of using WherobotsAI Map Matching to match trajectories in the VED dataset to the OpenStreetMap (OSM) road network.
First, we load the trajectory data. We’ll use the preprocessed VED dataset stored as GeoParquet files for demonstration.
dfPath = sedona.read.format("geoparquet").load("s3://wherobots-benchmark-prod/data/mm/ved/VED_traj/")
The trajectory dataset should contain the following attributes:
ids
geometry
The rows in the trajectory DataFrame look like this:
+---+-----+----+--------------------+--------------------+ |ids|VehId|Trip| coords| geometry| +---+-----+----+--------------------+--------------------+ | 0| 8| 706|[{0, 42.277558333...|LINESTRING (-83.6...| | 1| 8| 707|[{0, 42.277681388...|LINESTRING (-83.6...| | 2| 8| 708|[{0, 42.261997222...|LINESTRING (-83.7...| | 3| 10|1558|[{0, 42.277065833...|LINESTRING (-83.7...| | 4| 10|1561|[{0, 42.286599722...|LINESTRING (-83.7...| +---+-----+----+--------------------+--------------------+ only showing top 5 rows
We’ll use the OpenStreetMap (OSM) data specific to the Ann Arbor, Michigan region to map match our trip data with. Wherobots provides an API for loading road network data from OSM XML files.
from wherobots import matcher dfEdge = matcher.load_osm("s3://wherobots-examples/data/osm_AnnArbor_large.xml", "[car]") dfEdge.show(5)
The loaded road network DataFrame looks like this:
+--------------------+----------+--------+----------+-----------+----------+-----------+ | geometry| src| dst| src_lat| src_lon| dst_lat| dst_lon| +--------------------+----------+--------+----------+-----------+----------+-----------+ |LINESTRING (-83.7...| 68133325|27254523| 42.238819|-83.7390142|42.2386159|-83.7390153| |LINESTRING (-83.7...|9405840276|27254523|42.2386058|-83.7388915|42.2386159|-83.7390153| |LINESTRING (-83.7...| 68133353|27254523|42.2385675|-83.7390856|42.2386159|-83.7390153| |LINESTRING (-83.7...|2262917109|27254523|42.2384552|-83.7390313|42.2386159|-83.7390153| |LINESTRING (-83.7...|9979197063|27489080|42.3200426|-83.7272283|42.3200887|-83.7273003| +--------------------+----------+--------+----------+-----------+----------+-----------+ only showing top 5 rows
Users can also prepare the road network data from any data source using any data processing procedures, as long as the schema of the road network DataFrame conforms to the requirement of the Map Matching API.
Once the trajectories and road network data is ready, we can run matcher.match to match trajectories to the road network.
matcher.match
dfMmResult = matcher.match(dfEdge, dfPath, "geometry", "geometry")
The dfMmResult contains the trajectories snapped to the roads in matched_points attribute:
dfMmResult
matched_points
+---+--------------------+--------------------+--------------------+ |ids| observed_points| matched_points| matched_nodes| +---+--------------------+--------------------+--------------------+ |275|LINESTRING (-83.6...|LINESTRING (-83.6...|[62574078, 773611...| |253|LINESTRING (-83.6...|LINESTRING (-83.6...|[5930199197, 6252...| | 88|LINESTRING (-83.7...|LINESTRING (-83.7...|[4931645364, 6249...| |561|LINESTRING (-83.6...|LINESTRING (-83.6...|[29314519, 773612...| |154|LINESTRING (-83.7...|LINESTRING (-83.7...|[5284529433, 6252...| +---+--------------------+--------------------+--------------------+ only showing top 5 rows
We can visualize the map matching result using SedonaKepler to see what the matched trajectories look like:
mapAll = SedonaKepler.create_map() SedonaKepler.add_df(mapAll, dfEdge, name="Road Network") SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("observed_points AS geometry"), name="Observed Points") SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("matched_points AS geometry"), name="Matched Points") mapAll
The following figure shows the map matching results. The red lines are original trajectories, and the green lines are matched trajectories. We can see that the noisy original trajectories are all snapped to the road network.
We used WherobotsAI Map Matching to match 90 million trips across the entire US in just 1.5 hours on the Wherobots 2XL runtime, which equates to approximately 1 million trips per minute, at a cost of $873 (99% less expensive than Mapbox, while completing in a small fraction of the time). The average cost of matching 1 million trips is $9.70. Here’s how the cost of Map Matching 1 million trips in Wherobots compares to alternatives.
The “optimization magic” behind WherobotsAI Map Matching lies in how Wherobots intelligently and automatically co-partitions trajectory and road network datasets based on the spatial proximity of their elements, ensuring a balanced distribution of work. As a result, the computational load is balanced evenly through this partitioning strategy, and makes map matching with Wherobots highly efficient, scalable, and affordable compared to alternatives.
You can try out WherobotsAI Map Matching by starting a notebook environment in Wherobots Cloud and running our example notebook within Wherobots Cloud.
notebook_example/python/wherobots-ai/mapmatching_example.ipynb
You can also check out the WherobotsAI Map Matching tutorial and reference documentation for more information!
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Wherobots brought modern infrastructure to spatial data in 2025
We’re bridging the gap between AI and data from the physical world in 2026
The Medallion Architecture for Geospatial Data: Why Spatial Intelligence Demands a Different Approach
When most data engineers hear “medallion architecture,” they think of the traditional multi-hop layering pattern that powers countless analytics pipelines. The concept is sound: progressively refine raw data into analytical data and products. But geospatial data breaks conventional data engineering in ways that demand we rethink the entire pipeline. This isn’t about just storing location […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: