Your AI can now contextualize physical world data using Wherobots Spatial AI Coding Tools Learn More

Spatial Data Pipeline Architecture: PostGIS and Wherobots Together

Authors

wherobots geometric blog header

In the world of data architecture, there is a dangerous myth that you have to choose “one tool to rule them all.” We often see organizations paralyzed by the debate: “Should we use a Database or a Data Lake?”

A spatial data pipeline architecture built for both large-scale analytics and operational queries is one of the harder infrastructure decisions a geospatial team makes. Most organizations try to solve it with a single tool. That is where the problems start.

The teams that get this right do not choose between a data lake and a spatial database. They use both, in a defined sequence known as the Geospatial Medallion Architecture. The medallion architecture, a data pipeline pattern established in the data lakehouse ecosystem, organizes data into three progressive quality layers: Bronze, Silver, and Gold. Wherobots and PostGIS each own a distinct role in that pipeline.

Wherobots is a cloud-native spatial analytics platform, built by the original creators of Apache Sedona, that processes large-scale geospatial datasets using distributed compute. PostGIS is an open-source spatial extension for PostgreSQL that adds support for geographic objects and enables location-based queries in SQL. 

The spatial medallion architecture organizes geospatial data into three layers:

  1. Bronze: Raw ingestion. All incoming data (GPS feeds, satellite imagery, shapefiles) lands in cloud object storage (S3, Azure Blob, GCS) without transformation. The goal is preservation and speed of capture.
  2. Silver: Refinement. A distributed compute platform like Wherobots validates geometries, enriches records with spatial attributes, and converts raw formats to optimized columnar formats like GeoParquet. This layer is what data scientists use to train spatial AI models.
  3. Gold: Delivery. Aggregated, query-ready data served by PostGIS to BI tools and applications. Volume is low because the data is pre-aggregated. Response times are sub-second.

Think of your data not as static files, but as raw material (like crude oil or iron ore) that must be refined through a series of stages before it is valuable to your business.

Why Spatial Data Teams End Up with Swamps or Silos

Before we look at the solution, let’s look at the problem.

  • The “Data Swamp”: You dump all your raw files (CSV, Shapefiles, GeoJSON) into cloud storage. It’s cheap, but nobody can find anything. It’s a mess.
  • The “Database Silo”: You try to force everything into your operational database (PostGIS). The data is clean, but the database becomes slow, and expensive. Your analysts crash the system with heavy queries, and your app users complain about speed.

The Spatial Data Pipeline Medallion Architecture

The Medallion Architecture solves this by organizing your data into three distinct layers of quality: Bronze, Silver, and Gold.

This approach allows you to use the right tool for the right job: Wherobots for heavy industrial refining, and PostGIS for precision delivery. Wherobots is a cloud-native spatial analytics platform, built by the original creators of Apache Sedona, that processes large-scale geospatial datasets using distributed compute. PostGIS is an open-source spatial extension for PostgreSQL that adds support for geographic objects and enables location-based queries in SQL.

1. Bronze Layer: Raw Ingestion into Cloud Object Storage

The “Landing Zone”

In the spatial medallion architecture, the Bronze layer is the raw ingestion zone where all incoming data lands without transformation. This is the entry point for all your data. Whether its real-time telemetry from 10,000 delivery trucks, daily dumps of satellite imagery, or messy spreadsheets from a partner, it all lands here first.

  • The Goal: Speed and Preservation. We don’t change the data here; we just capture it so we never lose the original source.
  • The Technology: Cloud Object Storage (S3, Azure Blob, GCS).
  • Why: It is incredibly cheap and infinite. You can dump petabytes here without breaking the bank.

2. The Silver Layer: The Refinery

Clean, Standardize, & Enrich

This is where the magic happens and where the heavy lifting is required. Raw data is rarely ready for business. It has duplicates, missing fields, or invalid geometries (like a building polygon that twists into itself).

  • The Task:
    • Validation: Checking if GPS points are actually on land.
    • Enrichment: Taking a raw coordinate and adding “City,” “State,” and “Flood Risk Score” columns.
    • Optimization: Converting raw formats like CSV and Shapefile into columnar formats like GeoParquet or Apache Iceberg tables, which Wherobots stores natively through its Havasu spatial lakehouse catalog.
  • The Technology: Wherobots.
  • Why: This “refining” process requires massive computing power. You might be processing billions of records. Wherobots can spin up 1,000 nodes to crunch this data in minutes, clean it, and write it back to the lake as a trusted, high-quality “Silver” dataset. This is the layer your Data Scientists love because it’s clean enough to train AI models but detailed enough to find deep patterns.

​​AddressCloud runs property-level perils models for insurers, processing flood, fire, and climate risk data across millions of addresses. John Powell, Senior Geospatial Data Engineer at AddressCloud, describes what changed when they moved that workload into the Silver layer: “From a developer perspective, having data, algorithms and compute (and to be presented with a Spark/Sedona context in a Jupiter notebook on startup) combined in one platform is extremely powerful, comparable in many respects to Google Earth Engine, but with much greater guarantees of, and control over, job completion.”

The result: operations that previously took hours or days now complete in minutes, with no preprocessing step required to combine raster and vector data.

3. Gold Layer: Optimized Data for BI Tools and Applications

Aggregated & Ready for Business

This is the “Showroom” layer. This data is highly polished, aggregated, and formatted for specific business questions. For example: “Total Sales by Zip Code” or “Active Drivers by City.”

  • The Task: Serving answers to users instantly. When a CEO opens a dashboard or a customer opens an app, they can’t wait 10 seconds for a query to run. They need sub-second speed.
  • The Technology: PostGIS.
  • Why: By the time data reaches the “Gold” layer, the volume is much smaller because we have aggregated it. PostGIS is the perfect tool here. It is optimized for high-speed retrieval. It serves this “Gold” data to your BI tools (Tableau, Looker) and your web applications instantly.

How the Three Layers Work Together: The “Better Together” Workflow

By adopting this strategy, you create a data supply chain that maximizes the strengths of every tool:

  1. Lower Costs: You stop using your expensive database to store terabytes of raw, messy junk. That stays in the cheap Bronze layer.
  2. Higher Stability: Your heavy analytical jobs run in Wherobots (Silver Layer), completely separate from your operational database. Your analysts can crunch numbers all day without ever slowing down the app for your customers.
  3. Faster Innovation: Your data scientists don’t have to beg for database access. They can work directly with the Silver data in Wherobots to build advanced AI models, while the business teams continue to use the Gold data in PostGIS.

Summary: A Blueprint for Success

The spatial medallion architecture is not a tool choice. It is a pipeline pattern that assigns the right tool to the right job. If you are a leader looking to modernize your geospatial stack, don’t look for a “PostGIS replacement.” Look for a partner.

  • Use Wherobots to act as your heavy-lifting factory: ingesting, cleaning, and crunching massive scale data.
  • Use PostGIS to act as your high-speed storefront: delivering those insights to users the moment they need them.

This hybrid approach, the Spatial Medallion Architecture, is how modern organizations turn location data into competitive advantage.This is part three of a series. The prior posts cover PostGIS vs Wherobots for spatial data lakehouses and spatial database cost comparisons.

Start Building with Wherobots