Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
In a recent post, we showed how easy it is to use RasterFlow and Meta’s Segment Anything 3 Model (SAM3) to detect features in the physical world. A single end-to-end pipeline built a 133 GB NAIP mosaic of Marion County, Oregon, ran SAM3 against it with text prompts spanning eight classes, and produced approximately one million detection polygons in a Wherobots table including roughly 312,000 building roofs.
That is an impressive result on its own. But once the inference job finished, the obvious next question was: are these detections any good? Specifically, how well do they agree with an independent reference dataset of building footprints?
The Overture Maps Foundation publishes a global buildings dataset that is freely available in the Wherobots Hub. If I could compare the SAM3 roof detections against Overture for the same county, I would have a first-pass evaluation of whether SAM3 is finding the right things in roughly the right places.
The catch: I am a product manager, not a data scientist, and I am not well-versed in the standard techniques used to evaluate the output of remote-sensing models. Intersection-over-union, recall and precision curves, confidence calibration, the right metric coordinate reference system… I knew the terms, but I had not performed an evaluation like this from scratch before.
That is exactly where the Wherobots Spatial AI Assistant comes in.
I opened a conversation with the assistant inside Claude, using the Wherobots MCP server. I described what I had: a fresh SAM3 detection table in org_catalog.sam3_marion_db.sam3_outputs, and a goal of comparing it to Overture buildings. The full session took four prompts.
org_catalog.sam3_marion_db.sam3_outputs
“can you see SAM3 results in org_catalog.sam3_marion_db.sam3_outputs? can you tell me more about this dataset of SAM3 detections for Marion County OR?”
Within seconds, the assistant had walked the catalog, run summary statistics, and returned a complete profile: eight detection classes (layers), around one million total rows, 312k roofs, confidence scores ranging from 0.50 to 0.96, the table’s spatial extent, and the source mosaic. One of the queries it ran was a simple breakdown of the detections by class (layer):
SELECT layer, COUNT(*) AS n, AVG(bbox_score) AS avg_score FROM org_catalog.sam3_marion_db.sam3_outputs GROUP BY layer ORDER BY n DESC
A sample of SAM3 roof detections (blue) over NAIP imagery in Marion County, Oregon. Each polygon is a segmentation mask, not a bounding box.
“if I wanted to compare the buildings (roofs) detecting with SAM3 against the building footprints in the overture dataset in wherobots_open_data.overture_maps_foundation, how would I do that?”
wherobots_open_data.overture_maps_foundation
The assistant designed a four-stage approach: clip both datasets to a Marion County area of interest; spatial-join them on intersection; compute intersection-over-union per pair; and aggregate to recall, precision, and calibration metrics. It also surfaced a set of caveats:
These considerations gave me confidence that the analysis was well thought through and likely accurate.
Rather than a rough bounding box, I wanted the comparison clipped to the actual Marion County admin boundary, fetched from a trustworthy source.
“can you use wkls to get the official admin boundary from Overture for Marion County, like this: gdf = gpd.read_file(wkls['us']['or']['Marion County'].geojson())”
wkls
gdf = gpd.read_file(wkls['us']['or']['Marion County'].geojson())
The assistant rebuilt the join strategy around the admin polygon generated using wkls. It pulled the boundary as a single-row Spark view that could be broadcast into the spatial joins, and combined an Iceberg bounding-box prefilter on Overture with an exact ST_Intersects predicate against the real Marion County shape. The generated code was straightforward:
ST_Intersects
import wkls import geopandas as gpd gdf = gpd.read_file(wkls['us']['or']['Marion County'].geojson()) aoi_geom = gdf.geometry.iloc[0] aoi_wkt = aoi_geom.wkt sedona.sql(f""" CREATE OR REPLACE TEMP VIEW aoi AS SELECT ST_GeomFromWKT('{aoi_wkt}') AS geom """)
“can you create this analysis in a Jupyter notebook that I can run with Wherobots?”
The assistant returned a 24-cell notebook covering everything from SedonaContext setup through the final visualization. I ran it in VS Code using a Wherobots cloud runtime. The rest of this post walks through the code and the results.
Here’s the notebook:
sam3_vs_overture_marion.ipynb
The notebook has four sections:
Setup and constants. Setting up imports, initializing the SedonaContext, defining the target CRS (EPSG:32610, UTM zone 10N), IoU thresholds, and table names. It’s worth noting out that the assistant chose UTM zone 10N automatically (the right metric CRS for western Oregon) without me having to ask.
EPSG:32610
Fetch the AOI from wkls. The notebook calls wkls['us']['or']['Marion County'].geojson(), reads the result into a GeoPandas frame, extracts the polygon’s WKT and bounding box, and registers a one-row aoi Spark view. It then renders the boundary on a SedonaKepler map so I could visually confirm I had the right AOI before running anything expensive.
wkls['us']['or']['Marion County'].geojson()
aoi
Sanity counts. Two COUNT(*) queries over the SAM3 and Overture tables, both clipped to the admin boundary:
COUNT(*)
SAM3 finds notably more candidate roof shapes than Overture has buildings.
Filtered AOI views. Two temporary views, sam3_roofs and overture_bldgs, each with the lon/lat geometry and a UTM-projected geometry. The Overture view uses a two-stage filter: a bounding-box prefilter that takes advantage of Iceberg column statistics and an exact ST_Intersects predicate against the admin polygon for Marion County.
sam3_roofs
overture_bldgs
Spatial join. This step matches candidate buildings. For every pair of SAM3 roof and Overture building whose geometries intersect, the query computes the intersection area, the two source areas, and the IoU:
SELECT s.sam3_id, s.bbox_score, o.overture_id, ST_Area(ST_Intersection(s.geom_m, o.geom_m)) AS inter_m2, ST_Area(s.geom_m) AS sam3_m2, ST_Area(o.geom_m) AS overture_m2, ST_Area(ST_Intersection(s.geom_m, o.geom_m)) / NULLIF(ST_Area(s.geom_m) + ST_Area(o.geom_m) - ST_Area(ST_Intersection(s.geom_m, o.geom_m)), 0) AS iou FROM sam3_roofs s JOIN overture_bldgs o ON ST_Intersects(s.geom_4326, o.geom_4326)
Intersection-over-union (IoU) is the area shared by two polygons divided by their combined area. A value of 1.0 is a perfect match; 0.0 means no overlap.
The ST_Intersects predicate runs on the lon/lat geometries, where Sedona’s spatial join planner has the column statistics it needs to be efficient; the ST_Area and ST_Intersection calls run on the UTM-projected geometries, where the resulting numbers are in square meters and meaningful. The query returns 207,109 candidate matched pairs and is cached for the rest of the analysis.
ST_Area
ST_Intersection
Resolution and metrics. Sometimes a single SAM3 polygon covers several Overture buildings, for example when SAM3 segments a multi-roof complex as one shape. Other times, several SAM3 polygons land on the same Overture building. To keep the comparison clean, the notebook keeps only the best-matching pair on each side. The remaining cells then compute the numbers the rest of the post relies on:
A final cell renders five high-IoU and five low-IoU matched pairs on aerial imagery for visual spot-checking, so I could see with my own eyes where SAM3 and Overture agree and where they don’t.
Example results comparing Overture building footprints (yellow) and SAM3 roof detections (blue). Top: a high quality result. Center: SAM3 failed to segment the roof in this building. Bottom: SAM3 segments part of the roof, but not the whole footprint.
I wasn’t sure how best to assess the results. So I asked Claude to help with the analysis and here is a summary:
These results are impressive! With a single text prompt (”roofs”), I was able to produce detections that match a curated, multi-source reference dataset to within 7% on total roof area, found the right shape on three-quarters of known buildings, and carried a confidence score an application can actually trust.
Even though SAM3 was not fine-tuned for roof detection, the resulting output would be operationally useful for many use cases.
A few hours after I started, I had an initial assessment on the quality of the SAM3 detections. And I had a notebook with reproducible results that anyone can rerun in minutes. The assistant does not replace the role of the data scientist, but accelerates initial spatial and statistical analysis. And it did it while surfacing important caveats from the start.
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Graph RAG for the Physical World
Introduction RAG (Retrieval Augmented Generation) has addressed one of AI’s biggest challenges for enterprise users: missing or hallucinating empirical business and real world context . Instead of generating answers from nothing, RAG retrieves relevant documents and feeds them to the model as context. It works. Ask an AI about your company’s Q4 revenue, and RAG […]
Building the Wherobots Mobility Solution Accelerator: A Technical Deep Dive
Three Notebooks, One Medallion Architecture, Full 4D GPS Trajectory Processing: Part 2 of 2
Wherobots MCP Server: Building GEOINT Spatial Pipelines with AI Agents
I built three national-security GEOINT use cases on the Wherobots stack in days instead of weeks. A Critical Infrastructure Vulnerability (CIV) pipeline with two regional variants, plus a border-corridor analysis on real transportation segments. The Wherobots geospatial MCP server is what made that timeline possible. Most of the work in standing up a credible use […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: