Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
I built three national-security GEOINT use cases on the Wherobots stack in days instead of weeks. A Critical Infrastructure Vulnerability (CIV) pipeline with two regional variants, plus a border-corridor analysis on real transportation segments. The Wherobots geospatial MCP server is what made that timeline possible.
Most of the work in standing up a credible use case isn’t the analysis. It’s the plumbing. Finding the right data within the organization. Understanding the schema. Anchoring synthetic operational data to real geometries so the joins represent reality. That’s where the weeks go, and anyone who’s built a multi-table spatial pipeline on real data knows the shape of it.
About a dozen joined results, multiple Overture Maps Foundation themes, notional mission overlays, and an scenario-based MCP server evaluation notebook driving the whole thing. The build took days instead of weeks thanks to the Wherobots Spatial AI Coding Tools.
What used to be the slow part of this kind of build was never the analysis. It was the getting to the analysis. Finding the right Overture tables. Learning their column shapes. Remembering whether class lived on base_infrastructure or places_place. Figuring out that categories was structured as a STRUCT vs. a flat field. Discovering the hard way that power_pole and power_tower collectively contain 36M rows and could have consumed tons of unnecessary resources. The Wherobots MCP collapsed every one of those build time consuming events into a tool call.
class
base_infrastructure
places_place
categories
power_pole
power_tower
The end state for the CIV use case is a 10-step pipeline ending in a 5-component scoring formula:
vulnerability_score = 0.25 * flood_count + 0.25 * wildfire_count + 0.15 * (facility_count + 0.01 * building_count) + 0.20 * criticality_tier * 100 + 0.15 * (negative_tweet_count * 2 + neutral_tweet_count * 0.5)
Sitting underneath that score: an AOI layer, an asset layer, flood and wildfire proximity buffers, settlement/facility proximity, building density, a criticality tier table, seasonal hazard windows, social-media sentiment proximity, and a geofence overlay. About a dozen tables in wherobots.geoint.*_civ, each with a clear FK relationship back to critical_assets_civ.asset_id. Overture thematic geometries underneath, synthetic operational signal layered on top.
wherobots.geoint.*_civ
critical_assets_civ.asset_id
That’s a lot of schema. Designing it is the interesting part. Creating the plumbing without agentic support can extend the timeline significantly.
The real paradigm shift isn’t any single tool call. It’s that the agent can reach for my Wherobots organization’s current catalog, and any of the catalogs connections it has (Databricks Unity, AWS Glue, S3) whenever it needs to, intra-session. Not a snapshot from when the analysis was developed. Not a schema description I pasted into a system prompt three turns ago. When the agent invokes the MCP, the response reflects the catalog as it exists at that moment, which means:
critical_assets_civ
negative_tweet_count
describe_table
The catalog and the agent’s understanding don’t have to drift apart. That sounds like a small thing. In practice though, it is a BIG deal, because the work of building a multi-table use case is a sequence of “I just created this, now use it to build the next thing.” Whenever the agent reaches for the catalog, it can see what I’ve built so far, what’s in the open data catalog, and what fields are queryable. I’m never re-explaining my own schema.
This is what an AI context engine for the physical world looks like in practice. The agent isn’t reasoning about geography from training data. It’s reading the live state of a interconnected and extensible Wherobots spatial catalog and writing SQL grounded in it.
The loop that emerged on top of that capability looked the same across both builds:
base_water
base_land_cover
buildings_building
-- Catalog walk: filter base_infrastructure to classes worth joining on, -- excluding power_pole (18.3M rows) and power_tower (18.0M rows) once -- list_tables_tool surfaced their volume. SELECT class, COUNT(*) AS row_count FROM wherobots_open_data.overture_maps_foundation.base_infrastructure GROUP BY class ORDER BY row_count DESC;
categories.primary
subtype
-- Step 3: Flood Hazard Proximity -- Counts water features within 1 km of each critical infrastructure asset. -- Pre-filters water to Ukraine AOI to reduce spatial join volume. SELECT a.asset_id, COUNT(w.id) AS flood_water_count FROM critical_assets a JOIN wherobots_open_data.overture_maps_foundation.base_water w ON ST_DWithin(a.geometry, w.geometry, 1000, true) CROSS JOIN ukraine_aoi ua WHERE w.class IN ('river', 'stream', 'lake', 'reservoir', 'canal') AND ST_Intersects(ua.geometry, w.geometry) GROUP BY a.asset_id ORDER BY flood_water_count DESC;
limit=10
The result: I spent my time on the parts that required judgment. Where should the flood buffer be: 500m, 1km, 2km? 500m is too tight in dense urban terrain. 2km swallows half the AOI. The MCP let me test all three in the time it would normally take to write the first query. What’s the right tier-weighting for a substation vs. a generator? Does negative sentiment matter 2x or 4x more than neutral sentiment in the composite score? How do I express a seasonal vulnerability window so the downstream MCP scenarios can filter on it cleanly?
Here’s what the scoring formula above looks like as the actual SQL that runs against the persisted views. Every *_proximity table shares the same asset_id grain, so the composite score is a single set of LEFT JOINs back to the asset layer:
*_proximity
asset_id
-- Step 10: Enhanced Vulnerability Score (5-component composite) SELECT ca.asset_id, ca.class, ac.criticality_tier, COALESCE(fp.flood_water_count, 0) AS flood_count, COALESCE(wp.wildfire_landcover_count, 0) AS wildfire_count, COALESCE(sp.facility_count, 0) AS facility_count, COALESCE(bd.building_count, 0) AS building_count, COALESCE(sm.negative_tweet_count, 0) AS negative_tweet_count, ( 0.25 * COALESCE(fp.flood_water_count, 0) + 0.25 * COALESCE(wp.wildfire_landcover_count, 0) + 0.15 * (COALESCE(sp.facility_count, 0) + 0.01 * COALESCE(bd.building_count, 0)) + 0.20 * ac.criticality_tier * 100 + 0.15 * (COALESCE(sm.negative_tweet_count, 0) * 2 + COALESCE(sm.neutral_tweet_count, 0) * 0.5) ) AS vulnerability_score, CASE WHEN COALESCE(fp.flood_water_count, 0) > 0 AND COALESCE(wp.wildfire_landcover_count, 0) > 0 THEN TRUE ELSE FALSE END AS multi_hazard_flag FROM critical_assets ca JOIN asset_criticality ac ON ca.asset_id = ac.asset_id LEFT JOIN flood_proximity fp ON ca.asset_id = fp.asset_id LEFT JOIN wildfire_proximity wp ON ca.asset_id = wp.asset_id LEFT JOIN settlement_proximity sp ON ca.asset_id = sp.asset_id LEFT JOIN building_density bd ON ca.asset_id = bd.asset_id LEFT JOIN social_media_proximity sm ON ca.asset_id = sm.asset_id ORDER BY vulnerability_score DESC;
Strip the national-security framing off the build and the underlying loop is domain-agnostic. Swap critical_assets_civ for store locations and the proximity layers for foot-traffic decay and competitor density, and the same pipeline becomes a retail expansion analysis. Swap them for policy-in-force locations and named-storm tracks, and it’s catastrophe exposure modeling for an insurance carrier. Swap them for distribution centers and port-congestion signal, and it’s a supply-chain resilience build. Swap them for parcels and zoning and hazard layers, and it’s a municipal planning workflow.
The paradigm generalizes: a live catalog the agent re-reads every turn if needed, layered data anchored to real datasets and foundational Overture data, and an iterative discover → describe → generate → test → refine loop. It works anywhere you have a catalog of geospatial or tabular data you’re actively building on top of. The schema work is the same shape regardless of whether the bottom layer is Overture infrastructure, OpenStreetMap, NAIP rasters, parcel records, or a customer’s own warehoused operational data. What changes is the vocabulary on top.
Two regional CIV setup notebooks, a third for border-corridor segments, and an MCP scenario notebook with 10+ analyst prompts that exercise the full schema. All on real Overture geometry. All synthetic where it needs to be. All persisted to our managed Iceberg catalog as wherobots.geoint.* so the next analyst can pick it up where I left off.
wherobots.geoint.*
The deliverable looks like weeks of work. The MCP server and spatial AI coding tools made it days. The schema is where the value lives, and the AI coding tools let me spend my time there.
Here is a link to the use case setup notebook:
https://github.com/wherobots/wherobots-mcp-spatial-blog/blob/main/CIV_Demo_Setup_Ukraine.ipynb
Graph RAG for the Physical World
Introduction RAG (Retrieval Augmented Generation) has addressed one of AI’s biggest challenges for enterprise users: missing or hallucinating empirical business and real world context . Instead of generating answers from nothing, RAG retrieves relevant documents and feeds them to the model as context. It works. Ask an AI about your company’s Q4 revenue, and RAG […]
Building the Wherobots Mobility Solution Accelerator: A Technical Deep Dive
Three Notebooks, One Medallion Architecture, Full 4D GPS Trajectory Processing: Part 2 of 2
How well does SAM3 detect building footprints? Let’s ask the Wherobots Spatial AI Assistant!
In a recent post, we showed how easy it is to use RasterFlow and Meta’s Segment Anything 3 Model (SAM3) to detect features in the physical world. A single end-to-end pipeline built a 133 GB NAIP mosaic of Marion County, Oregon, ran SAM3 against it with text prompts spanning eight classes, and produced approximately one […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: