Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
RasterFlow now makes it simple to run promptable geospatial vision models across large aerial and satellite imagery collections, removing the need to build bespoke inference pipelines. With our new SAM 3 support, you can prompt for concepts like “roofs”, “roads”, or “shipping containers” and turn those detections into vector outputs ready for analysis in Wherobots, from city scale to country scale.
Here we see SAM 3 localize and capture roofs really well. It also segments roads pretty cleanly, though there is still room for improvement. Results are shown on the inference input, a NAIP 30 centimeter basemap, and include all instance pixels above 50% confidence.
In this post, we put SAM 3 to the test detecting roofs in suburban neighborhoods, shipping containers in crowded loading docks, and tractors across agricultural landscapes. Throughout, we comment on where it succeeds, where it falls short, and how results compare to previous SAM models. Finally, we discuss what’s next for computer vision in Earth observation (EO) and how the community can build on models like SAM to create more promptable, flexible applications that handle the scale and diversity of remote sensing imagery.
In 2023, the Segment Anything Model (SAM) set a new paradigm for computer vision. While most models were trained to address one task at a time, SAM demonstrated that a single model could reliably classify, localize, and segment many kinds of objects in complex natural scenes.
SAM set a new standard task for computer vision models that went beyond classification or semantic segmentation: Promptable Visual Segmentation (PVS), which takes spatial prompts such as boxes, points, or masks as input and predicts masks.
However, SAM 1’s performance on out-of-distribution imagery domains had issues. Back in 2023, while I was working on detection projects with 10 meter resolution imagery, I found it often took multiple rounds of prompting to get useful segments, and many results still needed manual cleanup.
For example, in a rural town, SAM 1 can segment many features across different spatial scales, but it still misses all roads.
SAM 2 improved accuracy and added video support, but both SAM 1 and SAM 2 were limited to making predictions without associated labels. The masks they produced were not grounded in categories useful for deriving insights.
Fast forward to today: SAM 3 addresses Promptable Concept Segmentation (PCS), a task where a model can accept either spatial prompts (masks, boxes, pixels) or text prompts like “cat”, “dog”, even “roofs”!
This opens up new possibilities for detecting objects in imagery. Because SAM 3’s training data spans many imaging domains, including overhead aerial imagery, it can succeed in many Earth observation contexts where previous model generations struggled. For example, one can use SAM 3 to predict “roofs” in NAIP imagery simply by asking, with no model training or ad hoc labeling needed, and get pretty stellar results.
To put SAM 3 to the test on Earth observation imagery, we generated many more SAM 3 predictions on top of National Agriculture Program 30 centimeter imagery using RasterFlow, our scalable mosaic building and inference engine. Below details the performance of RasterFlow on this high resolution detection task.
What used to take a complex mix of imagery ETL, bespoke inference pipelines, and self-provisioned infrastructure for large Earth observation processing now takes under an hour and two simple Python functions. Let’s check it out below.
First, we will generate a mosaic: a seamless, stitched-together image from many independent remote sensing scenes. RasterFlow handles all the data sourcing, loading, cleaning, and partitioning into a data asset optimized for inference for a particular model.
from rasterflow_remote import RasterflowClient rf_client = RasterflowClient() mosaic_output = rf_client.build_gti_mosaic( gti="s3://wherobots-examples/rasterflow/indexes/naip_index.parquet", aoi="s3://wherobots-examples/rasterflow/aois/marion_county.parquet", bands=["red", "green", "blue", "nir"], location_field="url", crs_epsg=3857, time_column="year", skip_xy_coords=False, xy_chunksize=1024, query="res == 0.3 and time >= '2022-01-01' and time <= '2023-01-01'", requester_pays=True, sort_field="time", ) print(mosaic_output.uri)
With our mosaic built, we can now run inference with SAM 3. Unlike more rigid models which only predict one category, SAM 3 can accept one or more text prompts and detect all matching objects in a single pass. The runtime for a given batch of imagery scales linearly with the number of detections. More prompts tends to mean more detections, so keep in mind that the more prompts you add, the longer you can expect an inference run to take.
from rasterflow_remote.data_models import GeometryActorEnum, MergeModeEnum model_output = rf_client.predict_mosaic_geometries( store="s3://wherobots-examples/rasterflow/mosaics/marion_county.zarr", model_path="https://huggingface.co/wherobots/sam3-text-geometry-pt2/resolve/main/full_sam3_pipeline.pt2", patch_size=1008, clip_size=0, device="cuda", features=["red", "green", "blue"], labels=["roads", "airplanes", "airports", "roofs", "solar panels", "swimming pools", "shipping containers", "tractors"], actor=GeometryActorEnum.TEXT_TO_VECTOR_GEOMETRIES, max_batch_size=1, confidence_threshold=0.5, merge_mode=MergeModeEnum.NONE, xy_block_multiplier=1, )
We can visualize both the mosaic and detections directly in the notebook with an embedded RasterFlow map. The RGB imagery mosaic and the SAM 3 detections have been web-optimized with RasterFlow for fast and fluid browsing.
If the embed does not load in your environment, open it in a new tab here.
After visually exploring detections, we can load the results into WherobotsDB for quantitative analysis. WherobotsDB lets us post-process geometries, calculate zonal statistics on other rasters, count objects, measure clustering, and much more.
import os from sedona.spark import * from pyspark.sql.functions import * from wherobots import vtiles config = ( SedonaContext.builder() .getOrCreate() ) sedona = SedonaContext.create(config) parquet_path = "s3://wherobots-examples/rasterflow/model-outputs/marion_county_sam3/" df = sedona.read.format("geoparquet").load(parquet_path) df.printSchema()
df.show(10)
Here we’ll use ST_Area to estimate the total square km area of all roofs in Marion County.
roofs_df = df.filter(col("label") == "roofs") source_srid = roofs_df.selectExpr("ST_SRID(geometry) AS srid").first()["srid"] source_crs = f"EPSG:{source_srid}" print(source_crs)
# Run the SRID inspection cell above first so source_crs reflects the data's actual CRS. roof_areas = roofs_df.withColumn( "area_sq_m", expr(f"ST_AreaSpheroid(ST_Transform(geometry, '{source_crs}', 'EPSG:4326'))"), ).cache() roof_areas_agg = roof_areas.agg( sum("area_sq_m").alias("total_area_sq_m"), (sum("area_sq_m") / lit(1_000_000.0)).alias("total_area_sq_km"), ) roof_areas_agg.show(truncate=False)
roof_size_stats = roof_areas.agg( count("*").alias("roof_count"), avg("area_sq_m").alias("avg_roof_sq_m"), expr("percentile_approx(area_sq_m, 0.5)").alias("median_roof_sq_m"), (avg("area_sq_m") * lit(10.7639)).alias("avg_roof_sq_ft"), ) stats = roof_size_stats.first() print(f"roof_count: {stats['roof_count']:,}") print(f"avg_roof_sq_m: {stats['avg_roof_sq_m']:.2f}") print(f"median_roof_sq_m: {stats['median_roof_sq_m']:.2f}")
label_counts = df.groupBy("label").count().orderBy(col("count").desc(), col("label")) label_counts.show(truncate=False)
We can also generate web-optimized vectors in PMTiles format. PMTiles are easily shareable and plug directly into geospatial visualization applications, making them a great choice for distributing detection results.
df_tiles = df.withColumn("layer", col("label")) output_path = 's3://wherobots-examples/rasterflow/model-outputs/marion_county_sam3.pmtiles' vtiles.generate_pmtiles(df_tiles, output_path)
After experimenting with SAM 3 on NAIP, I’m impressed with the range of categories that SAM 3 can positively identify. At the same time, the model still has clear precision and recall gaps for more niche semantic categories, like “tractors” or “shipping containers”.
These examples illustrate the gap between SAM 3’s strengths on common categories and its current limitations on more niche ones. I’m bullish that if SAM 3 were fine-tuned on a larger corpus of mixed high-resolution imagery and high-quality labels, it would perform even better outside of its primary domain of natural imagery.
Even so, I think the potential for SAM 3 for simpler categories like “roofs” is underutilized. SAM 3 could potentially improve many existing datasets we rely on to make decisions, like Overture Buildings and detections of other kinds of structures.
Some things we didn’t showcase but I recommend trying in your experiments:
If you’re excited to try SAM 3 on Wherobots, sign up for RasterFlow Private Preview. You can also get in touch at ryan@wherobots.com or talk to us here. I’d love to hear about what you’re looking to build and how SAM 3 could fit into your detection workflows.
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Spatial Data Processing Platforms: A Comparison of Enterprise and Cloud-Native Options
For Data Engineers and Architects Evaluating Spatial Workloads on Snowflake, Databricks, and PostGIS Six platforms dominate spatial data processing today: PostGIS for transactional workloads under 100GB, Snowflake and BigQuery GIS for light spatial enrichment inside a broader analytics platform, Databricks for vector spatial joins on the Lakehouse, Apache Sedona for self-managed open-source distributed spatial compute, […]
Spatial Data Pipeline Architecture: PostGIS and Wherobots Together
In the world of data architecture, there is a dangerous myth that you have to choose “one tool to rule them all.” We often see organizations paralyzed by the debate: “Should we use a Database or a Data Lake?” A spatial data pipeline architecture built for both large-scale analytics and operational queries is one of […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: