Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
I built three national-security GEOINT use cases on the Wherobots stack in days instead of weeks. A Critical Infrastructure Vulnerability (CIV) pipeline with two regional variants, plus a border-corridor analysis on real transportation segments. The Wherobots geospatial MCP server is what made that timeline possible.
Most of the work in standing up a credible use case isn’t the analysis. It’s the plumbing. Finding the right data within the organization. Understanding the schema. Anchoring synthetic operational data to real geometries so the joins represent reality. That’s where the weeks go, and anyone who’s built a multi-table spatial pipeline on real data knows the shape of it.
About a dozen joined results, multiple Overture Maps Foundation themes, notional mission overlays, and an scenario-based MCP server evaluation notebook driving the whole thing. The build took days instead of weeks thanks to the Wherobots Spatial AI Coding Tools.
What used to be the slow part of this kind of build was never the analysis. It was the getting to the analysis. Finding the right Overture tables. Learning their column shapes. Remembering whether class lived on base_infrastructure or places_place. Figuring out that categories was structured as a STRUCT vs. a flat field. Discovering the hard way that power_pole and power_tower collectively contain 36M rows and could have consumed tons of unnecessary resources. The Wherobots MCP collapsed every one of those build time consuming events into a tool call.
class
base_infrastructure
places_place
categories
power_pole
power_tower
The end state for the CIV use case is a 10-step pipeline ending in a 5-component scoring formula:
vulnerability_score = 0.25 * flood_count + 0.25 * wildfire_count + 0.15 * (facility_count + 0.01 * building_count) + 0.20 * criticality_tier * 100 + 0.15 * (negative_tweet_count * 2 + neutral_tweet_count * 0.5)
Sitting underneath that score: an AOI layer, an asset layer, flood and wildfire proximity buffers, settlement/facility proximity, building density, a criticality tier table, seasonal hazard windows, social-media sentiment proximity, and a geofence overlay. About a dozen tables in wherobots.geoint.*_civ, each with a clear FK relationship back to critical_assets_civ.asset_id. Overture thematic geometries underneath, synthetic operational signal layered on top.
wherobots.geoint.*_civ
critical_assets_civ.asset_id
That’s a lot of schema. Designing it is the interesting part. Creating the plumbing without agentic support can extend the timeline significantly.
The real paradigm shift isn’t any single tool call. It’s that the agent can reach for my Wherobots organization’s current catalog, and any of the catalogs connections it has (Databricks Unity, AWS Glue, S3) whenever it needs to, intra-session. Not a snapshot from when the analysis was developed. Not a schema description I pasted into a system prompt three turns ago. When the agent invokes the MCP, the response reflects the catalog as it exists at that moment, which means:
critical_assets_civ
negative_tweet_count
describe_table
The catalog and the agent’s understanding don’t have to drift apart. That sounds like a small thing. In practice though, it is a BIG deal, because the work of building a multi-table use case is a sequence of “I just created this, now use it to build the next thing.” Whenever the agent reaches for the catalog, it can see what I’ve built so far, what’s in the open data catalog, and what fields are queryable. I’m never re-explaining my own schema.
This is what an AI context engine for the physical world looks like in practice. The agent isn’t reasoning about geography from training data. It’s reading the live state of a interconnected and extensible Wherobots spatial catalog and writing SQL grounded in it.
The loop that emerged on top of that capability looked the same across both builds:
base_water
base_land_cover
buildings_building
-- Catalog walk: filter base_infrastructure to classes worth joining on, -- excluding power_pole (18.3M rows) and power_tower (18.0M rows) once -- list_tables_tool surfaced their volume. SELECT class, COUNT(*) AS row_count FROM wherobots_open_data.overture_maps_foundation.base_infrastructure GROUP BY class ORDER BY row_count DESC;
categories.primary
subtype
-- Step 3: Flood Hazard Proximity -- Counts water features within 1 km of each critical infrastructure asset. -- Pre-filters water to Ukraine AOI to reduce spatial join volume. SELECT a.asset_id, COUNT(w.id) AS flood_water_count FROM critical_assets a JOIN wherobots_open_data.overture_maps_foundation.base_water w ON ST_DWithin(a.geometry, w.geometry, 1000, true) CROSS JOIN ukraine_aoi ua WHERE w.class IN ('river', 'stream', 'lake', 'reservoir', 'canal') AND ST_Intersects(ua.geometry, w.geometry) GROUP BY a.asset_id ORDER BY flood_water_count DESC;
limit=10
The result: I spent my time on the parts that required judgment. Where should the flood buffer be: 500m, 1km, 2km? 500m is too tight in dense urban terrain. 2km swallows half the AOI. The MCP let me test all three in the time it would normally take to write the first query. What’s the right tier-weighting for a substation vs. a generator? Does negative sentiment matter 2x or 4x more than neutral sentiment in the composite score? How do I express a seasonal vulnerability window so the downstream MCP scenarios can filter on it cleanly?
Here’s what the scoring formula above looks like as the actual SQL that runs against the persisted views. Every *_proximity table shares the same asset_id grain, so the composite score is a single set of LEFT JOINs back to the asset layer:
*_proximity
asset_id
-- Step 10: Enhanced Vulnerability Score (5-component composite) SELECT ca.asset_id, ca.class, ac.criticality_tier, COALESCE(fp.flood_water_count, 0) AS flood_count, COALESCE(wp.wildfire_landcover_count, 0) AS wildfire_count, COALESCE(sp.facility_count, 0) AS facility_count, COALESCE(bd.building_count, 0) AS building_count, COALESCE(sm.negative_tweet_count, 0) AS negative_tweet_count, ( 0.25 * COALESCE(fp.flood_water_count, 0) + 0.25 * COALESCE(wp.wildfire_landcover_count, 0) + 0.15 * (COALESCE(sp.facility_count, 0) + 0.01 * COALESCE(bd.building_count, 0)) + 0.20 * ac.criticality_tier * 100 + 0.15 * (COALESCE(sm.negative_tweet_count, 0) * 2 + COALESCE(sm.neutral_tweet_count, 0) * 0.5) ) AS vulnerability_score, CASE WHEN COALESCE(fp.flood_water_count, 0) > 0 AND COALESCE(wp.wildfire_landcover_count, 0) > 0 THEN TRUE ELSE FALSE END AS multi_hazard_flag FROM critical_assets ca JOIN asset_criticality ac ON ca.asset_id = ac.asset_id LEFT JOIN flood_proximity fp ON ca.asset_id = fp.asset_id LEFT JOIN wildfire_proximity wp ON ca.asset_id = wp.asset_id LEFT JOIN settlement_proximity sp ON ca.asset_id = sp.asset_id LEFT JOIN building_density bd ON ca.asset_id = bd.asset_id LEFT JOIN social_media_proximity sm ON ca.asset_id = sm.asset_id ORDER BY vulnerability_score DESC;
Strip the national-security framing off the build and the underlying loop is domain-agnostic. Swap critical_assets_civ for store locations and the proximity layers for foot-traffic decay and competitor density, and the same pipeline becomes a retail expansion analysis. Swap them for policy-in-force locations and named-storm tracks, and it’s catastrophe exposure modeling for an insurance carrier. Swap them for distribution centers and port-congestion signal, and it’s a supply-chain resilience build. Swap them for parcels and zoning and hazard layers, and it’s a municipal planning workflow.
The paradigm generalizes: a live catalog the agent re-reads every turn if needed, layered data anchored to real datasets and foundational Overture data, and an iterative discover → describe → generate → test → refine loop. It works anywhere you have a catalog of geospatial or tabular data you’re actively building on top of. The schema work is the same shape regardless of whether the bottom layer is Overture infrastructure, OpenStreetMap, NAIP rasters, parcel records, or a customer’s own warehoused operational data. What changes is the vocabulary on top.
Two regional CIV setup notebooks, a third for border-corridor segments, and an MCP scenario notebook with 10+ analyst prompts that exercise the full schema. All on real Overture geometry. All synthetic where it needs to be. All persisted to our managed Iceberg catalog as wherobots.geoint.* so the next analyst can pick it up where I left off.
wherobots.geoint.*
The deliverable looks like weeks of work. The MCP server and spatial AI coding tools made it days. The schema is where the value lives, and the AI coding tools let me spend my time there.
Here is a link to the use case setup notebook:
https://github.com/wherobots/wherobots-mcp-spatial-blog/blob/main/CIV_Demo_Setup_Ukraine.ipynb
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Change Detection Using AlphaEarth Foundations (Part 2)
Continue exploring how Alpha Earth Embeddings reveal change over time using scores.
AlphaEarth Embeddings, Zonal Statistics, and PCA
Aggregate AlphaEarth embeddings over Iowa fields and visualize them with PCA.
Introducing the Wherobots Python SDK
What is the Wherobots Python SDK? The Wherobots Python SDK is a typed Python client for submitting, monitoring, and managing Wherobots job runs. It ships on PyPI as wherobots-python-sdk. One install, one API key, and you’re running spatial jobs from any Python environment: CI/CD pipelines, notebooks, a local shell. The SDK is built for three […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: