6 Mins Read

2 Jul 2024

WherobotsAI Map Matching: Turn GPS Tracks into Road-Aligned Trip Insights at Scale

Authors

Jia Yu

What is Map Matching?

GPS data is inherently noisy and often lacks precision, which can make it challenging to extract accurate insights. This imprecision means that the GPS points logged may not accurately represent the actual locations where a device was. For example, GPS data from a drive around a lake may incorrectly include points that are over the water!

Click here to launch this interactive notebook

Launch Notebook

To address these inaccuracies, teams commonly use two approaches:

Identifying and Dropping Erroneous Points: This method involves manually or algorithmically filtering out GPS points that are clearly incorrect. However, this approach can reduce analytical accuracy, be costly, and is time-intensive.
Map Matching Techniques: A smarter and more effective approach involves using map matching techniques. These techniques take the noisy GPS data points and compute the most likely path taken based on known transportation segments such as roadways or trails.

WherobotsAI Map Matching offers an advanced solution for this problem that’s significantly more performant and 99% lower cost than Mapbox, Google Maps, or HERE maps alternatives. It performs map matching with high scale on millions or even billions of trips with ease and performance, ensuring that the GPS data aligns accurately with the actual paths most likely taken.

An illustration of map matching. Blue dots: GPS samples, Green line: matched trajectory.

Map matching is a common solution for preparing GPS data for use in a wide range of applications including:

Sattelite & GPS based navigation
GPS tracking of freight
Assessing risk of driving behavior for improved insurance pricing
Post hoc analysis of self driving car trips for telematics teams
Transportation engineering and urban planning

The objective of map matching is to accurately determine which road or path in the digital map corresponds to the observed geographic coordinates, considering factors such as the accuracy of the location data, the density and layout of the road network, and the speed and direction of travel.

Existing Solutions for Map Matching

Most map matching implementations are variants of the Hidden Markov Model (HMM)-based algorithm described by Newson and Krumm in their seminal paper, “Hidden Markov Map Matching through Noise and Sparseness.” This foundational research has influenced a variety of map matching solutions available today.

However, traditional HMM-based approaches have notable downsides when working with large-scale GPS datasets:

Significant Costs: Many commercially available map matching APIs charge substantial fees for large-scale usage.
Performance Issues: Traditional map matching algorithms, while accurate, are often not optimized for large-scale computation. They can be prohibitively slow, especially when dealing with extensive GPS data, as the underlying computation struggles to handle the data scale efficiently.

These challenges highlight the need for more efficient and cost-effective solutions capable of handling large-scale GPS datasets without compromising on performance.

RESTful API Map Matching Options

The Mapbox Map Matching API, HERE Maps Route Matching API, and Google Roads API are powerful RESTful APIs for performing map matching. These solutions are particularly effective for small-scale applications. However, for large-scale applications, such as population-level analysis involving millions of trajectories, the costs can become prohibitively high.

For example, as of July 2024, the approximate costs for matching 1 million trips are:

Mapbox: $1,600
HERE Maps: $4,400
Google Maps Platform: $8,000
WherobotsAI Map Matching: ~$10 (see below)

These prices are based on public pricing pages and do not consider any potential volume-based discounts that may be available.

While these APIs provide robust and accurate map matching capabilities, organizations seeking to perform extensive analyses often must explore more cost-effective alternatives.

Open-Source Map Matching Solutions

Open-source software such as such as Valhalla or GraphHopper can also be used for map matching. However, these solutions are designed for use on a single-machine. If your map matching workload exceeds the capacity that machine, your workload will suffer from extended processing times. Furthermore, you will end up running out of headroom if you are vertically scaling up the ladder of VM sizes.

Meet WherobotsAI Map Matching

Map Matching is a high performance, low cost, and planetary scale map matching capability for your telematics pipelines.

WherobotsAI provides a scalable map matching feature designed for small to very large scale trajectory datasets. It works seamlessly with other Wherobots capabilities, which means you can implement data cleaning, data transformations, and map matching in one single (serverless) data processing pipeline. We’ll see how it works in the following sections.

How it works

WherobotsAI Map Matching takes a DataFrame containing trajectories and another DataFrame containing road segments, and produces a DataFrame containing map matched results. Here is a walk-through of using WherobotsAI Map Matching to match trajectories in the VED dataset to the OpenStreetMap (OSM) road network.

1. Preparing the Trajectory Data

First, we load the trajectory data. We’ll use the preprocessed VED dataset stored as GeoParquet files for demonstration.

dfPath = sedona.read.format("geoparquet").load("s3://wherobots-benchmark-prod/data/mm/ved/VED_traj/")

The trajectory dataset should contain the following attributes:

A unique ID for trips. In this example the ids attribute is the unique ID of each trip.
A geometry attribute containing LineStrings, in this case the geometry attribute is for trip data.

The rows in the trajectory DataFrame look like this:

+---+-----+----+--------------------+--------------------+
|ids|VehId|Trip|              coords|            geometry|
+---+-----+----+--------------------+--------------------+
|  0|    8| 706|[{0, 42.277558333...|LINESTRING (-83.6...|
|  1|    8| 707|[{0, 42.277681388...|LINESTRING (-83.6...|
|  2|    8| 708|[{0, 42.261997222...|LINESTRING (-83.7...|
|  3|   10|1558|[{0, 42.277065833...|LINESTRING (-83.7...|
|  4|   10|1561|[{0, 42.286599722...|LINESTRING (-83.7...|
+---+-----+----+--------------------+--------------------+
only showing top 5 rows

2. Preparing the Road Network Data

We’ll use the OpenStreetMap (OSM) data specific to the Ann Arbor, Michigan region to map match our trip data with. Wherobots provides an API for loading road network data from OSM XML files.

from wherobots import matcher
dfEdge = matcher.load_osm("s3://wherobots-examples/data/osm_AnnArbor_large.xml", "[car]")
dfEdge.show(5)

The loaded road network DataFrame looks like this:

+--------------------+----------+--------+----------+-----------+----------+-----------+
|            geometry|       src|     dst|   src_lat|    src_lon|   dst_lat|    dst_lon|
+--------------------+----------+--------+----------+-----------+----------+-----------+
|LINESTRING (-83.7...|  68133325|27254523| 42.238819|-83.7390142|42.2386159|-83.7390153|
|LINESTRING (-83.7...|9405840276|27254523|42.2386058|-83.7388915|42.2386159|-83.7390153|
|LINESTRING (-83.7...|  68133353|27254523|42.2385675|-83.7390856|42.2386159|-83.7390153|
|LINESTRING (-83.7...|2262917109|27254523|42.2384552|-83.7390313|42.2386159|-83.7390153|
|LINESTRING (-83.7...|9979197063|27489080|42.3200426|-83.7272283|42.3200887|-83.7273003|
+--------------------+----------+--------+----------+-----------+----------+-----------+
only showing top 5 rows

Users can also prepare the road network data from any data source using any data processing procedures, as long as the schema of the road network DataFrame conforms to the requirement of the Map Matching API.

3. Run Map Matching

Once the trajectories and road network data is ready, we can run matcher.match to match trajectories to the road network.

dfMmResult = matcher.match(dfEdge, dfPath, "geometry", "geometry")

The dfMmResult contains the trajectories snapped to the roads in matched_points attribute:

+---+--------------------+--------------------+--------------------+
|ids|     observed_points|      matched_points|       matched_nodes|
+---+--------------------+--------------------+--------------------+
|275|LINESTRING (-83.6...|LINESTRING (-83.6...|[62574078, 773611...|
|253|LINESTRING (-83.6...|LINESTRING (-83.6...|[5930199197, 6252...|
| 88|LINESTRING (-83.7...|LINESTRING (-83.7...|[4931645364, 6249...|
|561|LINESTRING (-83.6...|LINESTRING (-83.6...|[29314519, 773612...|
|154|LINESTRING (-83.7...|LINESTRING (-83.7...|[5284529433, 6252...|
+---+--------------------+--------------------+--------------------+
only showing top 5 rows

We can visualize the map matching result using SedonaKepler to see what the matched trajectories look like:

mapAll = SedonaKepler.create_map()
SedonaKepler.add_df(mapAll, dfEdge, name="Road Network")
SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("observed_points AS geometry"), name="Observed Points")
SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("matched_points AS geometry"), name="Matched Points")
mapAll

The following figure shows the map matching results. The red lines are original trajectories, and the green lines are matched trajectories. We can see that the noisy original trajectories are all snapped to the road network.

Performance

We used WherobotsAI Map Matching to match 90 million trips across the entire US in just 1.5 hours on the Wherobots 2XL runtime, which equates to approximately 1 million trips per minute, at a cost of $873 (99% less expensive than Mapbox, while completing in a small fraction of the time). The average cost of matching 1 million trips is $9.70. Here’s how the cost of Map Matching 1 million trips in Wherobots compares to alternatives.

Mapbox: $1,600
HERE Maps: $4,400
Google Maps Platform: $8,000
WherobotsAI Map Matching: $9.70

The “optimization magic” behind WherobotsAI Map Matching lies in how Wherobots intelligently and automatically co-partitions trajectory and road network datasets based on the spatial proximity of their elements, ensuring a balanced distribution of work. As a result, the computational load is balanced evenly through this partitioning strategy, and makes map matching with Wherobots highly efficient, scalable, and affordable compared to alternatives.

Try It Out!

You can try out Map Matching by starting a notebook environment in Wherobots Cloud and running our example notebook within Wherobots Cloud.

notebook_example/python/wherobots-ai/mapmatching_example.ipynb

You can also check out the Map Matching tutorial and reference documentation for more information!

Try Map Matching

Get Started

7 Mins Read 8 Jul 2026

How Bad Telemetry Data Sabotages Modern Fleets

By the Teams at Action Engine & Wherobots Fleet monitoring is undergoing a generational shift. Fleet monitoring, the systems that ingest and analyze vehicle telemetry to track fleet health, performance, and safety, has become the foundation of how operators run vehicles, not just track them. Modern vehicles generate orders of magnitude more telemetry than even […]

Stories + 2

3 Mins Read 8 Jul 2026

Introducing the Wherobots Innovation Edition, designed to accelerate your physical world objectives

The Wherobots Innovation Edition helps you deliver outcomes on top of spatial data that propel your organization forward, and make this data AI-ready. Today we are announcing the Wherobots Innovation Edition. This is an annual partnership that pairs the full Wherobots Cloud platform with our forward deployed spatial engineering expertise, developed over years of delivering […]

post

13 Mins Read 24 Jun 2026

Spatial Data in Apache Iceberg: Optimizations and Management That Matter

Spatial data in Apache Iceberg needs different optimization than tabular data. A geometry column has no natural sort order, so unsorted files carry wide, overlapping bounding boxes and query planners cannot prune them… At all… This behaviour turns a selective spatial filter into a full table scan. A second problem compounds it: one oversized geometry […]

Data Management + 3