12 Mins Read

3 Jun 2026

Building the Wherobots Mobility Solution Accelerator: A Technical Deep Dive

Authors

Matt Forrest

From Raw GPS Pings to Spatial Intelligence

In Part 1, we explored why mobility data breaks traditional spatial systems and what a modern processing architecture should look like. Now we get into the implementation.

This solution accelerator is a three-notebook pipeline built on Wherobots that processes the Microsoft Research GeoLife GPS Trajectories dataset, 182 users, 17,621 trajectories, and millions of GPS points collected in Beijing between 2007 and 2012. We chose this dataset specifically because it includes altitude data, which lets us demonstrate full XYZM (4D) geometry processing, something most spatial tutorials skip entirely because their data (or their platform) does not support it.

A Python preparation script (prepare_geolife.py) converts thousands of raw .plt files into a single CSV for cloud ingestion. From there, the three notebooks handle everything: ingestion, profiling, cleaning, enrichment, trajectory construction, map matching, spatial indexing, clustering, anomaly detection, and the creation of GeoParquet-backed analytical views that flow directly into Felt for interactive, collaborative visualization.

Notebook 1: Bronze Layer for Ingestion and Spatial Profiling

The Bronze layer loads the combined CSV from S3 into WherobotsDB (optimized Apache Sedona) and establishes the spatial foundation for everything downstream. You can access the notebook here.

Geometry Construction and Parsing

The first operation converts raw latitude and longitude columns into proper 2D point geometries. But even before that, we hit our first real-world challenge. Spark’s inferSchema option inferred the time column as a TimestampType, silently prepending today’s date to raw time values. The dates looked plausible in isolation but were completely wrong.

The fix uses DATE_FORMAT() to safely extract date and time parts regardless of the inferred type, while simultaneously constructing point geometries:

bronze_df = sedona.sql("""

    SELECT

        user_id,

        CAST(latitude AS DOUBLE) AS latitude,

        CAST(longitude AS DOUBLE) AS longitude,

        CAST(altitude_ft AS DOUBLE) AS altitude_ft,

        DATE_FORMAT(date_str, 'yyyy-MM-dd') AS date_str,

        DATE_FORMAT(time_str, 'HH:mm:ss') AS time_str,

        ST_MakePoint(

            CAST(longitude AS DOUBLE),

            CAST(latitude AS DOUBLE)

        ) AS geometry

    FROM raw_gps

""")

This is not just a formatting step, calling ST_MakePoint registers the data as a spatial type, enabling WherobotsDB’s spatial indexing and query optimization for all subsequent operations.

Lesson: Never trust Spark’s schema inference for temporal columns in mobility data. Explicitly cast and parse time values to avoid silent data corruption.

Data Quality Profiling

Before any transformation, we profile the raw data comprehensively. The dataset uses -777 as a sentinel value for missing altitude, so we quantify what percentage of records are affected, check coordinate bounds for points outside the expected Beijing-area extent, and analyze per-user distribution to understand data balance:

sedona.sql("""

    SELECT

        COUNT(*) AS total,

        SUM(CASE WHEN altitude_ft = -777 THEN 1 ELSE 0 END) AS invalid_altitude,

        ROUND(100.0 * SUM(CASE WHEN altitude_ft = -777 THEN 1 ELSE 0 END)

              / COUNT(*), 2) AS pct_invalid,

        ROUND(MIN(CASE WHEN altitude_ft != -777 THEN altitude_ft END), 1)

            AS min_valid_alt_ft,

        ROUND(MAX(CASE WHEN altitude_ft != -777 THEN altitude_ft END), 1)

            AS max_valid_alt_ft,

        ROUND(AVG(CASE WHEN altitude_ft != -777 THEN altitude_ft END), 1)

            AS avg_valid_alt_ft

    FROM bronze

""")

We also use SedonaKepler to visualize a 1% sample of points, confirming that the spatial extent covers the expected Beijing area. This visual validation catches problems that statistical profiling misses, sparse coverage zones, spatial outliers, and artifacts that only become apparent on a map.

The Bronze layer outputs raw GeoParquet to S3, giving us a spatially-typed, columnar, compressed foundation for all downstream processing.

Notebook 2: Silver Layer for The Transformation Engine

The Silver layer is where the pipeline earns its keep. This notebook handles cleaning, 4D geometry construction, trip segmentation, trajectory building, movement metric derivation, spatial indexing, and map matching. Each step has meaningful technical nuance worth examining.

4D XYZM Geometry Construction

After filtering invalid records and converting altitude from feet to meters, we construct 4D XYZM point geometries that encode position, elevation, and time into a single geometric object:

points_4d_df = sedona.sql("""

    SELECT

        user_id,

        latitude,

        longitude,

        altitude_m,

        epoch_seconds,

        date_str,

        time_str,

        ST_MakePoint(

            longitude,

            latitude,

            altitude_m,

            CAST(epoch_seconds AS DOUBLE)

        ) AS geometry_4d

    FROM cleaned

    WHERE epoch_seconds IS NOT NULL

""")

In this encoding, X = longitude, Y = latitude, Z = elevation in meters, M = Unix epoch timestamp as a measure value. We verify the construction with Sedona's dimension inspection functions:

sedona.sql("""

    SELECT

        ST_HasZ(geometry_4d) AS has_z,

        ST_HasM(geometry_4d) AS has_m,

        ST_CoordDim(geometry_4d) AS coord_dim,

        ST_Is3D(geometry_4d) AS is_3d

    FROM points_4d

    LIMIT 1

""")

These checks are not optional—they confirm that downstream operations like ST_ZMin(), ST_ZMax(), and ST_3DDistance() will have the dimensional data they need.

Trip Segmentation with PySpark Window Functions

Continuous GPS streams must be split into discrete trips. We use PySpark window functions to compute the time difference between consecutive GPS points for each user, flag gaps exceeding 20 minutes as trip boundaries, and assign composite trip IDs:

window_user = Window.partitionBy("user_id").orderBy("epoch_seconds")

segmented_df = (

    points_4d_df

    .withColumn("prev_epoch", F.lag("epoch_seconds").over(window_user))

    .withColumn("time_delta_s", F.col("epoch_seconds") - F.col("prev_epoch"))

    .withColumn(

        "is_new_trip",

        F.when(

            F.col("time_delta_s").isNull()

            | (F.col("time_delta_s") > TIME_GAP_THRESHOLD_SECONDS),

            F.lit(1)

        ).otherwise(F.lit(0))

    )

    .withColumn(

        "trip_segment",

        F.sum("is_new_trip").over(window_user)

    )

    # Create composite trip ID: user_id + segment number

    .withColumn(

        "trip_id",

        F.concat(F.col("user_id"), F.lit("_"), F.col("trip_segment"))

    )

)

This approach is clean, declarative, and executes efficiently in Spark’s distributed computation model. The TIME_GAP_THRESHOLD_SECONDS parameter (set to 1,200 seconds / 20 minutes) is easily adjustable for different use cases, delivery fleets might use 5 minutes, while long-haul trucking might use 60.

Trajectory Construction and the Ordering Problem

Once trips are defined, we build XYZM LineString trajectories per trip. This is where we hit one of the most important challenges in distributed trajectory processing: COLLECT_LIST in Spark does not guarantee order. The spatial geometry, the actual XY path, may be correct, but M values (timestamps) can be scrambled.

The solution requires a CTE-based approach that also handles the single-point edge case (ST_MakeLine requires at least 2 points, and Spark evaluates SELECT before HAVING):

trajectories_df = sedona.sql("""

    WITH trip_counts AS (

        SELECT trip_id

        FROM segmented

        GROUP BY trip_id

        HAVING COUNT(*) >= 2

    ),

    valid_points AS (

        SELECT s.*

        FROM segmented s

        INNER JOIN trip_counts tc ON s.trip_id = tc.trip_id

        ORDER BY s.user_id, s.trip_id, s.epoch_seconds

    )

    SELECT

        user_id,

        trip_id,

        ST_MakeLine(COLLECT_LIST(geometry_4d)) AS trajectory,

        COUNT(*) AS point_count,

        MIN(epoch_seconds) AS start_time,

        MAX(epoch_seconds) AS end_time,

        MIN(date_str) AS start_date,

        MAX(date_str) AS end_date

    FROM valid_points

    GROUP BY user_id, trip_id

""")

Sedona provides ST_IsValidTrajectory() to validate M-value ordering. When we applied it, every single trajectory was rejected due to the distributed collect ordering issue. Our pragmatic solution: use point_count filters instead of ST_IsValidTrajectory() as the quality gate, acknowledging that the spatial geometry is correct regardless of M ordering.

Lesson: In any distributed system, aggregation functions that collect values into arrays or lists may not preserve insertion order. Design your pipeline to tolerate this, or implement explicit sorting within the aggregation.

Movement Metrics

With trajectories constructed, we derive per-trip metrics using Sedona’s elevation and measurement functions:

trip_metrics_df = sedona.sql("""

    SELECT

        user_id,

        trip_id,

        trajectory,

        point_count,

        start_time,

        end_time,

        (end_time - start_time) AS duration_s,

        ROUND(ST_Length(trajectory), 6) AS distance_deg,

        ROUND(ST_ZMin(trajectory), 2) AS min_elevation_m,

        ROUND(ST_ZMax(trajectory), 2) AS max_elevation_m,

        ROUND(ST_ZMax(trajectory) - ST_ZMin(trajectory), 2) AS elevation_range_m

    FROM trajectories

    WHERE point_count >= 2

""")

Per-point metrics like speed, elevation delta, distance between consecutive points are computed separately using window functions over the segmented DataFrame with ST_Distance().

Spatial Indexing: H3 and GeoHash

We assign both H3 hexagon cell IDs and GeoHash values in a single pass, giving downstream Gold-layer analytics flexibility in how they aggregate spatial data:

indexed_points = sedona.sql("""

    SELECT

        *,

        EXPLODE(ST_H3CellIDs(geometry_4d, 9, false)) AS h3_cell_id,

        ST_GeoHash(geometry_4d, 7) AS geohash

    FROM enriched

""")

H3 at resolution 9 provides hexagonal cells roughly 105 meters across, ideal for urban mobility density analysis. GeoHash at precision 7 gives approximately 150-meter cells useful for range queries and segment identification.

Map Matching with Wherobots

The Silver layer’s final major operation is map matching, snapping noisy GPS traces to the actual road network. The matcher expects a simple DataFrame with IDs and geometry:

from wherobots import matcher

# Load Beijing OSM road network

roads_df = matcher.load_osm(OSM_DATA_PATH, "[car]")

# Prepare trajectories for matching

paths_df = trip_metrics_df.select(

    col("trip_id").alias("ids"),

    col("trajectory").alias("geometry")

)

# Run map matching

matched_df = matcher.match(

    roads_df,       # Road network edges

    paths_df,       # GPS trajectory LineStrings

    "geometry",     # Road geometry column name

    "geometry"      # Path geometry column name

)

The matcher produces three outputs per trajectory: observed_points (the raw GPS trace), matched_points (the road-snapped route), and matched_nodes (OSM node IDs along the matched path). Having this as a native operation within Wherobots, rather than calling an external API with rate limits and per-request pricing, eliminates an entire category of infrastructure complexity.

Notebook 3: Gold Layer for Analytics, Exploration, and Deep Dives

The Gold layer transforms Silver-layer trajectories into purpose-built analytical views. We structured these into three tiers: analytical, exploratory, and deep dive.

Analytical Views

H3 Hexbin Activity Density Heatmap. We aggregate GPS points by H3 cell and convert cell IDs back to hexagon polygons, enriching each cell with multi-dimensional metrics:

h3_density = sedona.sql("""

    SELECT

        h3_cell_id,

        ST_H3ToGeom(ARRAY(h3_cell_id))[0] AS geometry,

        COUNT(*) AS point_count,

        COUNT(DISTINCT user_id) AS unique_users,

        COUNT(DISTINCT trip_id) AS unique_trips,

        ROUND(AVG(speed_mps), 2) AS avg_speed_mps,

        ROUND(AVG(altitude_m), 2) AS avg_elevation_m,

        ROUND(MIN(altitude_m), 2) AS min_elevation_m,

        ROUND(MAX(altitude_m), 2) AS max_elevation_m

    FROM silver_points

    WHERE speed_mps IS NOT NULL

    GROUP BY h3_cell_id

    ORDER BY point_count DESC

""")

This view is the foundation for understanding spatial activity patterns, where movement concentrates, where it is sparse, and how intensity varies across the study area.

Temporal Patterns. Hourly and day-of-week aggregations reveal when mobility peaks and troughs occur:

temporal_patterns = sedona.sql("""

    SELECT

        HOUR(FROM_UNIXTIME(epoch_seconds)) AS hour_of_day,

        DAYOFWEEK(FROM_UNIXTIME(epoch_seconds)) AS day_of_week,

        COUNT(*) AS point_count,

        COUNT(DISTINCT user_id) AS active_users,

        COUNT(DISTINCT trip_id) AS active_trips,

        ROUND(AVG(speed_mps), 2) AS avg_speed_mps

    FROM silver_points

    WHERE speed_mps IS NOT NULL

    GROUP BY

        HOUR(FROM_UNIXTIME(epoch_seconds)),

        DAYOFWEEK(FROM_UNIXTIME(epoch_seconds))

    ORDER BY day_of_week, hour_of_day

""")

Trip Statistics and Map Matching Coverage. Distribution summaries characterize mobility behavior, while the ratio of successfully matched trajectories provides a quality metric for both the GPS data and the road network.

Exploratory Views

Elevation Profiles. We sample points at 5% intervals along the longest trajectories using ST_LineInterpolatePoint, then extract elevation and timestamps. This required splitting operations into separate CTEs because Spark does not allow EXPLODE() nested inside other expressions:

elevation_profiles = sedona.sql("""

    WITH top_trips AS (

        SELECT trip_id, trajectory, point_count

        FROM silver_trajectories

        WHERE point_count >= 10

        ORDER BY point_count DESC

        LIMIT 10

    ),

    raw_steps AS (

        SELECT EXPLODE(SEQUENCE(0, 100, 5)) AS step

    ),

    fractions AS (

        SELECT step / 100.0 AS fraction FROM raw_steps

    )

    SELECT

        t.trip_id,

        f.fraction,

        ST_LineInterpolatePoint(t.trajectory, f.fraction) AS point_along_route,

        ROUND(ST_Z(ST_LineInterpolatePoint(t.trajectory, f.fraction)), 2)

            AS elevation_m,

        ROUND(ST_M(ST_LineInterpolatePoint(t.trajectory, f.fraction)), 0)

            AS epoch_seconds

    FROM top_trips t

    CROSS JOIN fractions f

    ORDER BY t.trip_id, f.fraction

""")

2D vs 3D Distance Comparison. By comparing ST_Length() with ST_3DDistance(), we quantify the impact of terrain on distance calculations. In hilly areas, the difference can be significant enough to affect route planning, fuel modeling, and ETAs:

distance_comparison = sedona.sql("""

    SELECT

        trip_id,

        point_count,

        elevation_range_m,

        ROUND(ST_Length(trajectory) * 111320, 0) AS distance_2d_m,

        ROUND(

            ST_3DDistance(

                ST_StartPoint(trajectory),

                ST_EndPoint(trajectory)

            ) * 111320, 0

        ) AS straight_line_3d_m,

        ROUND(

            ST_Distance(

                ST_StartPoint(trajectory),

                ST_EndPoint(trajectory)

            ) * 111320, 0

        ) AS straight_line_2d_m

    FROM silver_trajectories

    WHERE elevation_range_m > 50

    ORDER BY elevation_range_m DESC

    LIMIT 20

""")

Map Matched vs Raw Comparison. Side-by-side SedonaKepler layers showing the original noisy GPS trace against the road-snapped route make the value of map matching immediately visible to any stakeholder.

Deep Dive Views

DBSCAN Stop Point Clustering. We filter points with speed below 0.5 m/s, then cluster using ST_DBSCAN with geodesic distance. An important implementation detail: ST_DBSCAN requires a physical column reference, not a computed expression:

# Filter to stationary points

stop_points = points_df.filter("speed_mps IS NOT NULL AND speed_mps < 0.5")

# ST_DBSCAN requires a named reference to a physical column.

# useSpheroid=true so epsilon is in meters (geodesic distance).

clusters_raw = sedona.sql("""

    SELECT

        *,

        ST_DBSCAN(geometry_4d, 100, 5, true) AS cluster_result

    FROM stop_points

""")

The 100-meter epsilon is appropriate for identifying distinct stop locations, buildings, intersections, transit stops, in an urban environment. Clusters with 10+ stop points are then aggregated into hotspots:

hotspots = sedona.sql("""

    SELECT

        cluster AS hotspot_id,

        COUNT(*) AS total_stops,

        COUNT(DISTINCT user_id) AS unique_visitors,

        COUNT(DISTINCT trip_id) AS unique_trips,

        ST_MakePoint(AVG(longitude), AVG(latitude)) AS geometry,

        ROUND(AVG(altitude_m), 1) AS avg_elevation_m

    FROM clusters

    WHERE cluster != -1 AND isCore = true

    GROUP BY cluster

    HAVING COUNT(*) >= 10

    ORDER BY total_stops DESC

""")

Trajectory Anomaly Detection. We flag trips with extreme values using a CTE pattern that avoids Spark’s “HAVING without GROUP BY” error:

anomalies = sedona.sql("""

    WITH trip_stats AS (

        SELECT

            AVG(duration_s) AS mean_duration,

            STDDEV(duration_s) AS std_duration,

            AVG(distance_deg * 111320) AS mean_distance,

            STDDEV(distance_deg * 111320) AS std_distance,

            AVG(elevation_range_m) AS mean_elev_range,

            STDDEV(elevation_range_m) AS std_elev_range

        FROM silver_trajectories

        WHERE duration_s > 0

    ),

    classified AS (

        SELECT

            t.trip_id,

            t.user_id,

            t.trajectory,

            t.duration_s,

            ROUND(t.distance_deg * 111320, 0) AS distance_m,

            t.elevation_range_m,

            CASE

                WHEN t.elevation_range_m

                    > (s.mean_elev_range + 3 * s.std_elev_range)

                    THEN 'extreme_elevation'

                WHEN t.duration_s

                    > (s.mean_duration + 3 * s.std_duration)

                    THEN 'extreme_duration'

                WHEN t.distance_deg * 111320

                    > (s.mean_distance + 3 * s.std_distance)

                    THEN 'extreme_distance'

                ELSE NULL

            END AS anomaly_type

        FROM silver_trajectories t

        CROSS JOIN trip_stats s

    )

    SELECT * FROM classified

    WHERE anomaly_type IS NOT NULL

""")

Road Segment Speed Analysis. By joining matched routes with trajectory metrics and decomposing routes into individual segments using GeoHash pairs, we calculate average speed per road segment and classify by congestion:

matched_with_metrics = sedona.sql("""

    SELECT

        m.ids AS trip_id,

        t.user_id,

        m.matched_points AS geometry,

        t.duration_s,

        ROUND(ST_Length(m.matched_points) * 111320, 0) AS matched_distance_m,

        CASE WHEN t.duration_s > 0

            THEN ROUND(

                ST_Length(m.matched_points) * 111320 / t.duration_s * 3.6, 1)

            ELSE 0

        END AS avg_speed_kmh,

        CASE WHEN t.duration_s > 0 THEN

            CASE

                WHEN (ST_Length(m.matched_points) * 111320

                      / t.duration_s * 3.6) < 15 THEN 'congested'

                WHEN (ST_Length(m.matched_points) * 111320

                      / t.duration_s * 3.6) < 40 THEN 'urban'

                WHEN (ST_Length(m.matched_points) * 111320

                      / t.duration_s * 3.6) < 80 THEN 'arterial'

                ELSE 'highway'

            END

            ELSE 'unknown'

        END AS road_class

    FROM silver_matched m

    INNER JOIN silver_trajectories t ON m.ids = t.trip_id

""")

This visualization uses the actual road LineString geometries from map matching, not H3 hexagons, providing road-level granularity that is directly actionable for traffic engineering and route optimization.

From Notebooks to Dashboards: The Wherobots + Felt Connection

Processing mobility data at scale is only half the equation. The other half is getting the results into the hands of decision-makers (operations teams, urban planners, fleet managers, logistics analysts) who need interactive, shareable maps, not Jupyter notebooks.

This is where the Wherobots and Felt integration completes the picture. Wherobots and Felt recently announced a strategic partnership that connects Wherobots’ spatial intelligence lakehouse directly with Felt’s collaborative, browser-based mapping platform. The integration lets organizations go from processing petabyte-scale geospatial datasets in the cloud to exploring insights in interactive maps, without moving large datasets between systems.

For this mobility accelerator, the workflow is straightforward. Every Gold-layer view we produce, H3 density heatmaps, hotspot clusters, road segment speed classifications, trajectory anomaly maps, is written as GeoParquet to S3. Through the native Wherobots-Felt integration, these datasets are directly accessible in Felt, where they become live, interactive, collaborative map layers. There is no export step, no format conversion, and no data movement friction.

What this means in practice:

Shareable analysis, not static screenshots. Instead of exporting a Kepler.gl map as an image or HTML file, the H3 activity density view becomes a live Felt map that operations teams can explore, filter, annotate, and share via a link, viewable from any device, no GIS software required.

Collaborative investigation. When the anomaly detection pipeline flags suspicious trajectories, an analyst does not need to walk a fleet manager through a notebook. They share a Felt map where the flagged routes are overlaid on the road speed classification layer, and the fleet manager can pan, zoom, and query the data themselves.

Operational dashboards from analytical views. The road segment speed analysis, with its congested/urban/arterial/highway classification, becomes a traffic conditions dashboard. The hotspot identification layer becomes a POI and dwell-time analysis tool. These are not one-off visualizations—they are reusable, updatable artifacts that stay connected to the processed data.

AI-assisted map creation. Felt’s AI-driven interface lets users interact with spatial data using natural language prompts, lowering the barrier for non-GIS teams to extract insights. Combined with Wherobots’ processing power, this creates what both companies describe as the “SQL-to-map” workflow, from Spatial SQL query to interactive, shareable map in seconds.

This combination is already in production. Leaf Agriculture uses Wherobots and Felt together to process millions of acres of tractor telemetry and imagery data, turning their agricultural data lake into interactive maps and dashboards distributed via links instead of in-person screen-sharings. The mobility use case follows the same pattern: process at scale with Wherobots, visualize and collaborate in Felt.

Challenges and Solutions: A Practitioner’s Reference

Every mobility data pipeline encounters edge cases. Here are the ones we solved in this accelerator, documented so you can avoid them:

1. Spark inferSchema corrupting time values. Spark silently prepended today’s date to inferred TimestampType columns. Fix: Use DATE_FORMAT() to extract date and time parts explicitly.

2. ST_MakeLine failing on single-point groups. A LineString requires at least two points, but Spark evaluates SELECT before HAVING. Fix: Use a CTE to pre-filter trips with >= 2 points before the aggregation query.

3. COLLECT_LIST not preserving order. Distributed aggregation does not guarantee array ordering. Fix: Accept unordered M values in trajectories and use point_count filters instead of ST_IsValidTrajectory() as a quality gate.

4. EXPLODE nested in expressions. Spark does not allow EXPLODE() inside other SQL expressions. Fix: Split the operation into separate CTEs.

5. ST_DBSCAN requiring physical column references. The function does not accept computed expressions as geometry input. Fix: Use the persisted GeoParquet column.

6. HAVING without GROUP BY. Applying aggregate thresholds without a GROUP BY clause. Fix: Use a CTE to compute thresholds, then apply as WHERE conditions.

Apache Sedona Spatial SQL Functions Used

This accelerator demonstrates a broad cross-section of Sedona’s spatial SQL capabilities:

Category	Functions
4D Point Construction	`ST_MakePoint(x, y, z, m)`
Dimension Inspection	`ST_Z(), ST_M()`, `ST_HasZ()`, `ST_HasM()`, `ST_CoordDim()`, `ST_Is3D()`
Elevation Analysis	`ST_ZMin()`, `ST_ZMax()`, `ST_3DDistance()`
Trajectory Building	`ST_MakeLine()`, `ST_IsValidTrajectory()`
Trajectory Sampling	`ST_LineInterpolatePoint()`
Spatial Indexing	`ST_H3CellIDs()`, `ST_H3ToGeom()`, `ST_GeoHash()`
Spatial Measurement	`ST_Length()`, `ST_Distance()`, `ST_StartPoint()`, `ST_EndPoint()`
Clustering	`ST_DBSCAN()`
Map Matching	`matcher.load_osm()`, `matcher.match()`
Geometry Construction	`ST_MakePoint()`, `ST_Buffer()`
Visualization	`SedonaKepler.create_map()`, `SedonaKepler.add_df()`

GeoParquet Output and Tool Interoperability

All Gold-layer views are persisted as GeoParquet files. GeoParquet has emerged as the standard columnar format for geospatial data in the cloud-native ecosystem, offering efficient compression, predicate pushdown for spatial filters, and broad tool compatibility. The analytical views produced by this accelerator are immediately consumable in Felt, Kepler.gl, QGIS, Foursquare Studio, DuckDB Spatial, and any other tool that reads GeoParquet, no export step, no format conversion, no data loss.

With the Wherobots-Felt integration, GeoParquet outputs in S3 become live data sources for interactive maps. This closes the loop from raw GPS pings to collaborative dashboards in a single, end-to-end spatial data stack.

Get Started with GPS Trajectory Processing on Wherobots

The Wherobots Mobility Solution Accelerator is designed to be a starting point, not a black box. The three notebooks are fully documented, the Spatial SQL is readable and modifiable, and every intermediate result is inspectable as a Spark DataFrame or visualizable in SedonaKepler, and from there, publishable as an interactive Felt map.

Whether you are processing fleet telematics, rideshare trajectories, maritime AIS data, or drone flight logs, the patterns demonstrated here—medallion architecture, 4D geometry processing, distributed trip segmentation, integrated map matching, GeoParquet output, and collaborative visualization through Felt—translate directly to your use case.

To explore the accelerator, visit Wherobots and get started with a notebook environment and connect your favorite IDE to the Wherobots MCP Server and Spatial AI Coding Tools. To see how the Wherobots + Felt stack works together, check out the integration documentation. If you have questions about applying these patterns to your mobility data, reach out to the Wherobots team or join the Apache Sedona community on Discord.

Get Started with the Spatial AI Coding Tools

Try Now

7 Mins Read 8 Jul 2026

How Bad Telemetry Data Sabotages Modern Fleets

By the Teams at Action Engine & Wherobots Fleet monitoring is undergoing a generational shift. Fleet monitoring, the systems that ingest and analyze vehicle telemetry to track fleet health, performance, and safety, has become the foundation of how operators run vehicles, not just track them. Modern vehicles generate orders of magnitude more telemetry than even […]

Stories + 2

3 Mins Read 8 Jul 2026

Introducing the Wherobots Innovation Edition, designed to accelerate your physical world objectives

The Wherobots Innovation Edition helps you deliver outcomes on top of spatial data that propel your organization forward, and make this data AI-ready. Today we are announcing the Wherobots Innovation Edition. This is an annual partnership that pairs the full Wherobots Cloud platform with our forward deployed spatial engineering expertise, developed over years of delivering […]

post

13 Mins Read 24 Jun 2026

Spatial Data in Apache Iceberg: Optimizations and Management That Matter

Spatial data in Apache Iceberg needs different optimization than tabular data. A geometry column has no natural sort order, so unsorted files carry wide, overlapping bounding boxes and query planners cannot prune them… At all… This behaviour turns a selective spatial filter into a full table scan. A second problem compounds it: one oversized geometry […]

Data Management + 3