Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
In the world of data architecture, there is a dangerous myth that you have to choose “one tool to rule them all.” We often see organizations paralyzed by the debate: “Should we use a Database or a Data Lake?”
A spatial data pipeline architecture built for both large-scale analytics and operational queries is one of the harder infrastructure decisions a geospatial team makes. Most organizations try to solve it with a single tool. That is where the problems start.
The teams that get this right do not choose between a data lake and a spatial database. They use both, in a defined sequence known as the Geospatial Medallion Architecture. The medallion architecture, a data pipeline pattern established in the data lakehouse ecosystem, organizes data into three progressive quality layers: Bronze, Silver, and Gold. Wherobots and PostGIS each own a distinct role in that pipeline.
Wherobots is a cloud-native spatial analytics platform, built by the original creators of Apache Sedona, that processes large-scale geospatial datasets using distributed compute. PostGIS is an open-source spatial extension for PostgreSQL that adds support for geographic objects and enables location-based queries in SQL.
The spatial medallion architecture organizes geospatial data into three layers:
Think of your data not as static files, but as raw material (like crude oil or iron ore) that must be refined through a series of stages before it is valuable to your business.
Before we look at the solution, let’s look at the problem.
The Medallion Architecture solves this by organizing your data into three distinct layers of quality: Bronze, Silver, and Gold.
This approach allows you to use the right tool for the right job: Wherobots for heavy industrial refining, and PostGIS for precision delivery. Wherobots is a cloud-native spatial analytics platform, built by the original creators of Apache Sedona, that processes large-scale geospatial datasets using distributed compute. PostGIS is an open-source spatial extension for PostgreSQL that adds support for geographic objects and enables location-based queries in SQL.
The “Landing Zone”
In the spatial medallion architecture, the Bronze layer is the raw ingestion zone where all incoming data lands without transformation. This is the entry point for all your data. Whether its real-time telemetry from 10,000 delivery trucks, daily dumps of satellite imagery, or messy spreadsheets from a partner, it all lands here first.
Clean, Standardize, & Enrich
This is where the magic happens and where the heavy lifting is required. Raw data is rarely ready for business. It has duplicates, missing fields, or invalid geometries (like a building polygon that twists into itself).
AddressCloud runs property-level perils models for insurers, processing flood, fire, and climate risk data across millions of addresses. John Powell, Senior Geospatial Data Engineer at AddressCloud, describes what changed when they moved that workload into the Silver layer: “From a developer perspective, having data, algorithms and compute (and to be presented with a Spark/Sedona context in a Jupiter notebook on startup) combined in one platform is extremely powerful, comparable in many respects to Google Earth Engine, but with much greater guarantees of, and control over, job completion.”
The result: operations that previously took hours or days now complete in minutes, with no preprocessing step required to combine raster and vector data.
Aggregated & Ready for Business
This is the “Showroom” layer. This data is highly polished, aggregated, and formatted for specific business questions. For example: “Total Sales by Zip Code” or “Active Drivers by City.”
By adopting this strategy, you create a data supply chain that maximizes the strengths of every tool:
The spatial medallion architecture is not a tool choice. It is a pipeline pattern that assigns the right tool to the right job. If you are a leader looking to modernize your geospatial stack, don’t look for a “PostGIS replacement.” Look for a partner.
This hybrid approach, the Spatial Medallion Architecture, is how modern organizations turn location data into competitive advantage.This is part three of a series. The prior posts cover PostGIS vs Wherobots for spatial data lakehouses and spatial database cost comparisons.
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Change Detection Using AlphaEarth Foundations (Part 2)
Continue exploring how Alpha Earth Embeddings reveal change over time using scores.
AlphaEarth Embeddings, Zonal Statistics, and PCA
Aggregate AlphaEarth embeddings over Iowa fields and visualize them with PCA.
Introducing the Wherobots Python SDK
What is the Wherobots Python SDK? The Wherobots Python SDK is a typed Python client for submitting, monitoring, and managing Wherobots job runs. It ships on PyPI as wherobots-python-sdk. One install, one API key, and you’re running spatial jobs from any Python environment: CI/CD pipelines, notebooks, a local shell. The SDK is built for three […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: