5 Mins Read

10 Mar 2026

Raster Processing at Scale: The Out-of-Database Architecture Behind WherobotsDB

Authors

Pranav Toggi

Introduction

Raster data (satellite imagery, elevation models, sensor grids) is critical to understanding the physical world and increasingly to powering AI. The challenge most data teams face is processing it at scale.

Processing raster data at scale requires an architecture that avoids loading entire files into memory. WherobotsDB solves this with an out-of-database approach that fetches pixel data on demand, enabling terabyte-scale processing without the memory overhead of traditional raster engines.

WherobotsDB extends open-source Apache Sedona with capabilities and performance optimizations purpose-built for preparing physical-world data for AI at scale, while maintaining full API compatibility. Existing Sedona workloads run without code changes.

This post covers how WherobotsDB handles the full raster lifecycle: scalable processing architecture, raster math, coordinate reference systems, vector-raster hybrid workflows, and planetary-scale inference.

“With Wherobots, we were able to merge 15+ complex vector datasets in minutes and run high-resolution ML inference on raster imagery at a fraction of the cost of our legacy stack. The combination of speed, scalability, and ease of integration has boosted our engineering productivity and will accelerate how quickly we can deliver new geospatial data products to market.”

— Rashmit Singh, CTO, SatSure

What Is Out-of-Database Raster Architecture?

At the foundation of Wherobots raster capabilities is an out-of-database raster architecture – which makes it far easier to process raster imagery in an embarrassingly parallel fashion. Instead of loading entire raster files into memory, only metadata is stored and pixel data is fetched on-demand. This means teams can process terabyte-scale imagery collections — statewide mosaics, multi-year satellite archives, continental elevation models — with the same interface they use for vector data. Operations like zonal statistics, clipping, masking, filtering, and raster algebra scale to datasets that would overwhelm in-memory approaches.

Capability	Apache Sedona	WherobotsDB	Notes
Out-DB Raster Support	◔	●	Creates lightweight raster references; pixel data loaded only when needed
Intelligent Caching Layer	○	●	Minimizes repeated remote reads for frequently-accessed rasters
Optimized Shuffle Operations	○	●	Data movement handles only metadata — orders of magnitude faster than full rasters
On-Demand Materialization	○	●	Selectively convert external rasters to in-database format when needed
Automatic Metadata Optimization	○	●	Pre-loads metadata and intelligently repartitions for optimal parallelism
Cloud-Optimized GeoTIFF Support	◐	●	Native COG support with tile-based partial reads from cloud storage

How Out-DB Architecture Transforms Raster Operations

The out-of-database architecture fundamentally changes how raster operations execute:

Operation	Apache Sedona (In-DB Only)	WherobotsDB
Data Movement (Shuffle)	Serializes all pixel data across executors	Serializes only metadata (~KB vs GB)
Tiling Operations	Copies pixel data into each tile	Memory-efficient tiling without data duplication
Clipping	Full pixel-by-pixel processing	Optimized processing paths for common operations
Zonal Statistics	Processes entire raster regardless of region size	Materializes only zonal pixels, optimizing I/O based on region of interest
Raster Loading	Loads entire raster into memory at read time	Lazy loading: metadata on-demand, pixels only when accessed
Resource Management	Standard memory lifecycle	Intelligent caching layer with disk caching for remote files

Key benefits:

On-Demand Data Access: Instead of loading entire raster files into memory, WherobotsDB fetches pixel data only when an operation requires it, reducing memory overhead and enabling processing at terabyte scale.
Memory-Efficient Tiling: Tiles share references to the underlying file with different spatial bounds — enabling massive parallelism without memory overhead.
Smart I/O Reduction: Operations optimize I/O based on the region of interest, and spatial filter push-down skips irrelevant raster files entirely.

Cloud-Optimized GeoTIFFs (COGs) are GeoTIFF files structured so that only the specific byte ranges needed for a given operation are fetched from remote storage, rather than downloading the entire file. Combined with on-demand loading, this architecture minimizes both memory footprint and network I/O.

What Raster Capabilities Does WherobotsDB Include?

Building on this foundation, WherobotsDB includes enhanced raster functions that enable satellite imagery, elevation models, and sensor data analysis directly in SQL alongside traditional vector operations.

Capability	Apache Sedona	WherobotsDB	Notes
Raster to Vector Conversion	○	●	Convert raster regions to vector polygons for hybrid vector-raster analysis workflows
Multi-Band Tile Processing	○	●	Align, stack, and tile rasters from different sources, CRS, and resolutions for distributed multi-source analysis
Zonal Statistics	◕	●	Both support the full statistics suite; WherobotsDB’s Out-DB architecture materializes only zonal pixels, enabling scalability across millions of zones
Custom Raster Algebra	◕	●	Flexible map algebra expressions with near-native execution performance
Spatial Filter Push-down for Rasters	○	●	Uses bounding boxes to skip irrelevant raster files, dramatically reducing I/O for selective queries

These capabilities fall into two categories:

Transforming raster data for hybrid workflows – Raster to Vector Conversion, Multi-Band Tile Processing.
Analyzing raster data in place – Zonal Statistics, Custom Raster Algebra, Spatial Filter Push-down.

Raster to Vector Conversion converts contiguous raster regions with the same pixel value into vector polygons. Essential for workflows that need to analyze raster-derived features (flood extents, land cover classifications, building footprints from DSM) using vector spatial operations like overlay, buffering, or spatial joins.

Multi-Band Tile Processing solves one of the most common friction points in raster analysis: combining data from different sources. Satellite imagery from different sensors, time periods, or providers typically arrives in different coordinate reference systems, resolutions, and data types. WherobotsDB aligns and stacks these into a unified multi-band raster, then tiles it into spatial chunks for distributed processing — all in a single operation. This enables workflows like change detection across multi-temporal composites, fusing Sentinel-2 optical bands with elevation data, or building analysis-ready multi-spectral stacks, without manual reprojection or resampling steps.

Zonal Statistics computes aggregate statistics (count, sum, mean, median, mode, stddev, variance, min, max) for raster pixels falling within vector zones. Both Apache Sedona and WherobotsDB support zonal statistics — the differentiator is scale. WherobotsDB’s out-of-database architecture materializes only the pixels within each zone rather than processing the entire raster, making it practical to run zonal statistics across millions of zones on terabyte-scale imagery.

Custom Raster Algebra executes user-defined raster algebra expressions with near-native execution performance. It supports complex multi-band calculations, conditional logic, and neighborhood operations — enabling workflows like computing NDVI (Normalized Difference Vegetation Index, a measure of vegetation density derived from red and near-infrared bands) from satellite imagery, or applying threshold-based classification across large imagery collections.

Spatial Filter Push-down for Rasters uses bounding boxes to skip irrelevant raster files, dramatically reducing I/O for selective queries. When a catalog contains thousands of scenes but only a few intersect your area of interest, irrelevant files are eliminated before any processing begins.

Because all of these functions are built on the out-of-database architecture, they inherit the same scalability characteristics described above — lazy loading, selective pixel materialization, and intelligent caching, without additional configuration.

What Is RasterFlow and How Does It Work?

Recently, Wherobots has added an entirely new inference and perception engine for planetary-scale image processing – extending the raster lifecycle beyond analysis into AI. RasterFlow enables teams to run computer vision models against large-scale raster datasets. From preparing imagery, mosaicking, removing edge effects across tiles, executing distributed model inference, and converting predictions into vector geometries, all within Wherobots Cloud.

RasterFlow’s outputs are stored as vectorized results in Apache Iceberg tables — an open table format for large-scale analytic datasets — or as predictions within ZARR (a cloud-native format for chunked, compressed multi-dimensional arrays) or COGs, which can be seamlessly analyzed using the full suite of spatial operations in WherobotsDB. This creates end-to-end raster workflows — from raw imagery through model inference to spatial analytics, without moving data between systems or building custom infrastructure.

RasterFlow supports both popular open-source geospatial AI models and custom PyTorch models, and can generate embeddings from geospatial foundation models. It is currently available to select customers in private preview. If you’re interested in RasterFlow, join our upcoming session to see it in action.

What Comes Next: Query Performance and Spatial Analytics

Raster processing is not only a first-class capability in WherobotsDB, but also it’s one part of a broader set of spatial data processing advances we’ve built beyond open-source Sedona. Vector and raster workloads both benefit from the same query performance optimizations under the hood: spatial relationship acceleration, automatic join optimization, dynamic data redistribution, and a vectorized GeoParquet reader. Queries that require careful tuning with self-managed Sedona run optimally out-of-the-box with WherobotsDB.

In the next post in this series, we’ll go deep on query performance and spatial analytics, how WherobotsDB accelerates spatial joins, range queries, and analytical functions across both vector and raster data types.

Get Started with Wherobots

Access Now

Frequently Asked Questions

Out-of-database raster architecture stores only metadata in the database while fetching pixel data on demand from remote storage. Instead of loading entire raster files into memory, only the pixels required for a specific operation are materialized. This enables teams to process terabyte-scale imagery collections — including statewide mosaics, multi-year satellite archives, and continental elevation models — without the memory overhead of traditional in-memory approaches.

WherobotsDB extends open-source Apache Sedona with additional capabilities and performance optimizations purpose-built for large-scale physical-world data processing. While both support raster operations like zonal statistics, WherobotsDB’s out-of-database architecture materializes only the pixels within each zone rather than processing the entire raster — making it practical to run zonal statistics across millions of zones on terabyte-scale imagery. Existing Apache Sedona workloads run on WherobotsDB without code changes.

WherobotsDB supports raster-to-vector conversion, multi-band tile processing, zonal statistics, custom raster algebra, and spatial filter push-down. These capabilities cover both transforming raster data for hybrid workflows and analyzing raster data in place. All functions are built on the out-of-database architecture, inheriting lazy loading, selective pixel materialization, and intelligent caching without additional configuration.

RasterFlow is an inference and perception engine for large-scale image processing built into Wherobots Cloud. It enables teams to run computer vision models against large-scale raster datasets, handling imagery preparation, mosaicking, edge effect removal, distributed model inference, and conversion of predictions into vector geometries. RasterFlow supports open-source geospatial AI models and custom PyTorch models, and can generate embeddings from geospatial foundation models. It is currently available to select customers in private preview.

Yes. WherobotsDB maintains full API compatibility with open-source Apache Sedona. Existing Sedona workloads run without code changes.

7 Mins Read 10 Dec 2025

Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence

RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.

Computer Vision + 4

5 Mins Read 27 Feb 2026

PostGIS, Wherobots, and the Spatial Data Lakehouse: A Strategic Guide for Leaders

Explore PostGIS, Wherobots, and the Spatial Data Lakehouse. Learn when to use each for scalable geospatial analytics, AI, and cost-efficient data strategy.

Spatial Lakehouse

4 Mins Read 19 Feb 2026

It takes 15 minutes for the Caltrain to get from Sunnyvale to SAP Center

That’s how long it took our MCP server to go from “how many bus stops are in Maryland” to an answer

General + 2

4 Mins Read 10 Feb 2026

Wherobots and Felt Partner to Modernize Spatial Intelligence

We’re excited to announce Wherobots and Felt are partnering to enable data teams to innovate with physical world data and move beyond legacy GIS, using the modern spatial intelligence stack. The stack with Wherobots and Felt provides a cloud-native, spatial processing and collaborative mapping solution that accelerates innovation and time-to-insight across an organization. What is […]

General + 1

Raster Processing at Scale: The Out-of-Database Architecture Behind WherobotsDB

Introduction

What Is Out-of-Database Raster Architecture?

How Out-DB Architecture Transforms Raster Operations

What Raster Capabilities Does WherobotsDB Include?

What Is RasterFlow and How Does It Work?

What Comes Next: Query Performance and Spatial Analytics

Frequently Asked Questions

What is out-of-database raster architecture?

How does WherobotsDB differ from open-source Apache Sedona for raster processing?

What raster operations does WherobotsDB support?

What is RasterFlow?

Do existing Apache Sedona workloads run on WherobotsDB without code changes?

Raster Processing at Scale: The Out-of-Database Architecture Behind WherobotsDB

Introduction

What Is Out-of-Database Raster Architecture?

How Out-DB Architecture Transforms Raster Operations

What Raster Capabilities Does WherobotsDB Include?

What Is RasterFlow and How Does It Work?

What Comes Next: Query Performance and Spatial Analytics

Frequently Asked Questions

What is out-of-database raster architecture?

How does WherobotsDB differ from open-source Apache Sedona for raster processing?

What raster operations does WherobotsDB support?

What is RasterFlow?

Do existing Apache Sedona workloads run on WherobotsDB without code changes?

RELATED POSTS