TABLE OF CONTENTS

    Contributors

    • Kristin Cowalcijk

      Kristin Cowalcijk is a staff engineer of Wherobots Inc. He is a PMC of Apache Sedona.

    • Jia Yu

      Jia Yu is a co-founder and the Chief Architect of Wherobots Inc. He is the PMC Chair of Apache Sedona

    What is Map Matching?

    GPS data is inherently noisy and often lacks precision, which can make it challenging to extract accurate insights. This imprecision means that the GPS points logged may not accurately represent the actual locations where a device was. For example, GPS data from a drive around a lake may incorrectly include points that are over the water!

    To address these inaccuracies, teams commonly use two approaches:

    1. Identifying and Dropping Erroneous Points: This method involves manually or algorithmically filtering out GPS points that are clearly incorrect. However, this approach can reduce analytical accuracy, be costly, and is time-intensive.
    2. Map Matching Techniques: A smarter and more effective approach involves using map matching techniques. These techniques take the noisy GPS data points and compute the most likely path taken based on known transportation segments such as roadways or trails.

    WherobotsAI Map Matching offers an advanced solution for this problem. It performs map matching with high scale on millions or even billions of trips with ease and performance, ensuring that the GPS data aligns accurately with the actual paths most likely taken.

    map matching telematics

    An illustration of map matching. Blue dots: GPS samples, Green line: matched trajectory.

    Map matching is a common solution for preparing GPS data for use in a wide range of applications including:

    • Sattelite & GPS based navigation
    • GPS tracking of freight
    • Assessing risk of driving behavior for improved insurance pricing
    • Post hoc analysis of self driving car trips for telematics teams
    • Transportation engineering and urban planning

    The objective of map matching is to accurately determine which road or path in the digital map corresponds to the observed geographic coordinates, considering factors such as the accuracy of the location data, the density and layout of the road network, and the speed and direction of travel.

    Existing Solutions for Map Matching

    Most map matching implementations are variants of the Hidden Markov Model (HMM)-based algorithm described by Newson and Krumm in their seminal paper, "Hidden Markov Map Matching through Noise and Sparseness." This foundational research has influenced a variety of map matching solutions available today.

    However, traditional HMM-based approaches have notable downsides when working with large-scale GPS datasets:

    1. Significant Costs: Many commercially available map matching APIs charge substantial fees for large-scale usage.
    2. Performance Issues: Traditional map matching algorithms, while accurate, are often not optimized for large-scale computation. They can be prohibitively slow, especially when dealing with extensive GPS data, as the underlying computation struggles to handle the data scale efficiently.

    These challenges highlight the need for more efficient and cost-effective solutions capable of handling large-scale GPS datasets without compromising on performance.

    RESTful API Map Matching Options

    The Mapbox Map Matching API, HERE Maps Route Matching API, and Google Roads API are powerful RESTful APIs for performing map matching. These solutions are particularly effective for small-scale applications. However, for large-scale applications, such as population-level analysis involving millions of trajectories, the costs can become prohibitively high.

    For example, as of July 2024, the approximate costs for matching 1 million trips are:

    • Mapbox: $1,600
    • HERE Maps: $4,400
    • Google Maps Platform: $8,000

    These prices are based on public pricing pages and do not consider any potential volume-based discounts that may be available.

    While these APIs provide robust and accurate map matching capabilities, organizations seeking to perform extensive analyses often must explore more cost-effective alternatives.

    Open-Source Map Matching Solutions

    Open-source software such as such as Valhalla or GraphHopper can also be used for map matching. However, these solutions are designed for use on a single-machine. If your map matching workload exceeds the capacity that machine, your workload will suffer from extended processing times. Furthermore, you will end up running out of headroom if you are vertically scaling up the ladder of VM sizes.

    Meet WherobotsAI Map Matching

    WherobotsAI Map Matching is a high performance, low cost, and planetary scale map matching capability for your telematics pipelines.

    WherobotsAI provides a scalable map matching feature designed for small to very large scale trajectory datasets. It works seamlessly with other Wherobots capabilities, which means you can implement data cleaning, data transformations, and map matching in one single (serverless) data processing pipeline. We’ll see how it works in the following sections.

    How it works

    WherobotsAI Map Matching takes a DataFrame containing trajectories and another DataFrame containing road segments, and produces a DataFrame containing map matched results. Here is a walk-through of using WherobotsAI Map Matching to match trajectories in the VED dataset to the OpenStreetMap (OSM) road network.

    1. Preparing the Trajectory Data

    First, we load the trajectory data. We’ll use the preprocessed VED dataset stored as GeoParquet files for demonstration.

    dfPath = sedona.read.format("geoparquet").load("s3://wherobots-benchmark-prod/data/mm/ved/VED_traj/")

    The trajectory dataset should contain the following attributes:

    • A unique ID for trips. In this example the ids attribute is the unique ID of each trip.
    • A geometry attribute containing LineStrings, in this case the geometry attribute is for trip data.

    The rows in the trajectory DataFrame look like this:

    +---+-----+----+--------------------+--------------------+
    |ids|VehId|Trip|              coords|            geometry|
    +---+-----+----+--------------------+--------------------+
    |  0|    8| 706|[{0, 42.277558333...|LINESTRING (-83.6...|
    |  1|    8| 707|[{0, 42.277681388...|LINESTRING (-83.6...|
    |  2|    8| 708|[{0, 42.261997222...|LINESTRING (-83.7...|
    |  3|   10|1558|[{0, 42.277065833...|LINESTRING (-83.7...|
    |  4|   10|1561|[{0, 42.286599722...|LINESTRING (-83.7...|
    +---+-----+----+--------------------+--------------------+
    only showing top 5 rows
    2. Preparing the Road Network Data

    We’ll use the OpenStreetMap (OSM) data specific to the Ann Arbor, Michigan region to map match our trip data with. Wherobots provides an API for loading road network data from OSM XML files.

    from wherobots import matcher
    dfEdge = matcher.load_osm("s3://wherobots-examples/data/osm_AnnArbor_large.xml", "[car]")
    dfEdge.show(5)

    The loaded road network DataFrame looks like this:

    +--------------------+----------+--------+----------+-----------+----------+-----------+
    |            geometry|       src|     dst|   src_lat|    src_lon|   dst_lat|    dst_lon|
    +--------------------+----------+--------+----------+-----------+----------+-----------+
    |LINESTRING (-83.7...|  68133325|27254523| 42.238819|-83.7390142|42.2386159|-83.7390153|
    |LINESTRING (-83.7...|9405840276|27254523|42.2386058|-83.7388915|42.2386159|-83.7390153|
    |LINESTRING (-83.7...|  68133353|27254523|42.2385675|-83.7390856|42.2386159|-83.7390153|
    |LINESTRING (-83.7...|2262917109|27254523|42.2384552|-83.7390313|42.2386159|-83.7390153|
    |LINESTRING (-83.7...|9979197063|27489080|42.3200426|-83.7272283|42.3200887|-83.7273003|
    +--------------------+----------+--------+----------+-----------+----------+-----------+
    only showing top 5 rows

    Users can also prepare the road network data from any data source using any data processing procedures, as long as the schema of the road network DataFrame conforms to the requirement of the Map Matching API.

    3. Run Map Matching

    Once the trajectories and road network data is ready, we can run matcher.match to match trajectories to the road network.

    dfMmResult = matcher.match(dfEdge, dfPath, "geometry", "geometry")

    The dfMmResult contains the trajectories snapped to the roads in matched_points attribute:

    +---+--------------------+--------------------+--------------------+
    |ids|     observed_points|      matched_points|       matched_nodes|
    +---+--------------------+--------------------+--------------------+
    |275|LINESTRING (-83.6...|LINESTRING (-83.6...|[62574078, 773611...|
    |253|LINESTRING (-83.6...|LINESTRING (-83.6...|[5930199197, 6252...|
    | 88|LINESTRING (-83.7...|LINESTRING (-83.7...|[4931645364, 6249...|
    |561|LINESTRING (-83.6...|LINESTRING (-83.6...|[29314519, 773612...|
    |154|LINESTRING (-83.7...|LINESTRING (-83.7...|[5284529433, 6252...|
    +---+--------------------+--------------------+--------------------+
    only showing top 5 rows

    We can visualize the map matching result using SedonaKepler to see what the matched trajectories look like:

    mapAll = SedonaKepler.create_map()
    SedonaKepler.add_df(mapAll, dfEdge, name="Road Network")
    SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("observed_points AS geometry"), name="Observed Points")
    SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("matched_points AS geometry"), name="Matched Points")
    mapAll

    The following figure shows the map matching results. The red lines are original trajectories, and the green lines are matched trajectories. We can see that the noisy original trajectories are all snapped to the road network.

    map matching results example 2

    Performance

    We used WherobotsAI Map Matching to match 90 million trips across the entire US in just 1.5 hours on the Wherobots Tokyo runtime, which equates to approximately 1 million trips per minute. The average cost of matching 1 million trips is an order of magnitude less costly and more efficient than the options outlined above.

    The “optimization magic” behind WherobotsAI Map Matching lies in how Wherobots intelligently and automatically co-partitions trajectory and road network datasets based on the spatial proximity of their elements, ensuring a balanced distribution of work. As a result, the computational load is balanced evenly through this partitioning strategy, and makes map matching with Wherobots highly efficient, scalable, and affordable compared to alternatives.

    Try It Out!

    You can try out WherobotsAI Map Matching by starting a notebook environment in Wherobots Cloud and running our example notebook within Wherobots Cloud.

    notebook_example/python/wherobots-ai/mapmatching_example.ipynb

    You can also check out the WherobotsAI Map Matching tutorial and reference documentation for more information!

    Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

    Contributors

    • Kristin Cowalcijk

      Kristin Cowalcijk is a staff engineer of Wherobots Inc. He is a PMC of Apache Sedona.

    • Jia Yu

      Jia Yu is a co-founder and the Chief Architect of Wherobots Inc. He is the PMC Chair of Apache Sedona