Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Raster data (satellite imagery, elevation models, sensor grids) is critical to understanding the physical world and increasingly to powering AI. The challenge most data teams face is processing it at scale.
Processing raster data at scale requires an architecture that avoids loading entire files into memory. WherobotsDB solves this with an out-of-database approach that fetches pixel data on demand, enabling terabyte-scale processing without the memory overhead of traditional raster engines.
WherobotsDB extends open-source Apache Sedona with capabilities and performance optimizations purpose-built for preparing physical-world data for AI at scale, while maintaining full API compatibility. Existing Sedona workloads run without code changes.
This post covers how WherobotsDB handles the full raster lifecycle: scalable processing architecture, raster math, coordinate reference systems, vector-raster hybrid workflows, and planetary-scale inference.
“With Wherobots, we were able to merge 15+ complex vector datasets in minutes and run high-resolution ML inference on raster imagery at a fraction of the cost of our legacy stack. The combination of speed, scalability, and ease of integration has boosted our engineering productivity and will accelerate how quickly we can deliver new geospatial data products to market.” — Rashmit Singh, CTO, SatSure
“With Wherobots, we were able to merge 15+ complex vector datasets in minutes and run high-resolution ML inference on raster imagery at a fraction of the cost of our legacy stack. The combination of speed, scalability, and ease of integration has boosted our engineering productivity and will accelerate how quickly we can deliver new geospatial data products to market.”
— Rashmit Singh, CTO, SatSure
At the foundation of Wherobots raster capabilities is an out-of-database raster architecture – which makes it far easier to process raster imagery in an embarrassingly parallel fashion. Instead of loading entire raster files into memory, only metadata is stored and pixel data is fetched on-demand. This means teams can process terabyte-scale imagery collections — statewide mosaics, multi-year satellite archives, continental elevation models — with the same interface they use for vector data. Operations like zonal statistics, clipping, masking, filtering, and raster algebra scale to datasets that would overwhelm in-memory approaches.
The out-of-database architecture fundamentally changes how raster operations execute:
Key benefits:
Cloud-Optimized GeoTIFFs (COGs) are GeoTIFF files structured so that only the specific byte ranges needed for a given operation are fetched from remote storage, rather than downloading the entire file. Combined with on-demand loading, this architecture minimizes both memory footprint and network I/O.
Building on this foundation, WherobotsDB includes enhanced raster functions that enable satellite imagery, elevation models, and sensor data analysis directly in SQL alongside traditional vector operations.
These capabilities fall into two categories:
Raster to Vector Conversion converts contiguous raster regions with the same pixel value into vector polygons. Essential for workflows that need to analyze raster-derived features (flood extents, land cover classifications, building footprints from DSM) using vector spatial operations like overlay, buffering, or spatial joins.
Multi-Band Tile Processing solves one of the most common friction points in raster analysis: combining data from different sources. Satellite imagery from different sensors, time periods, or providers typically arrives in different coordinate reference systems, resolutions, and data types. WherobotsDB aligns and stacks these into a unified multi-band raster, then tiles it into spatial chunks for distributed processing — all in a single operation. This enables workflows like change detection across multi-temporal composites, fusing Sentinel-2 optical bands with elevation data, or building analysis-ready multi-spectral stacks, without manual reprojection or resampling steps.
Zonal Statistics computes aggregate statistics (count, sum, mean, median, mode, stddev, variance, min, max) for raster pixels falling within vector zones. Both Apache Sedona and WherobotsDB support zonal statistics — the differentiator is scale. WherobotsDB’s out-of-database architecture materializes only the pixels within each zone rather than processing the entire raster, making it practical to run zonal statistics across millions of zones on terabyte-scale imagery.
Custom Raster Algebra executes user-defined raster algebra expressions with near-native execution performance. It supports complex multi-band calculations, conditional logic, and neighborhood operations — enabling workflows like computing NDVI (Normalized Difference Vegetation Index, a measure of vegetation density derived from red and near-infrared bands) from satellite imagery, or applying threshold-based classification across large imagery collections.
Spatial Filter Push-down for Rasters uses bounding boxes to skip irrelevant raster files, dramatically reducing I/O for selective queries. When a catalog contains thousands of scenes but only a few intersect your area of interest, irrelevant files are eliminated before any processing begins.
Because all of these functions are built on the out-of-database architecture, they inherit the same scalability characteristics described above — lazy loading, selective pixel materialization, and intelligent caching, without additional configuration.
Recently, Wherobots has added an entirely new inference and perception engine for planetary-scale image processing – extending the raster lifecycle beyond analysis into AI. RasterFlow enables teams to run computer vision models against large-scale raster datasets. From preparing imagery, mosaicking, removing edge effects across tiles, executing distributed model inference, and converting predictions into vector geometries, all within Wherobots Cloud.
RasterFlow’s outputs are stored as vectorized results in Apache Iceberg tables — an open table format for large-scale analytic datasets — or as predictions within ZARR (a cloud-native format for chunked, compressed multi-dimensional arrays) or COGs, which can be seamlessly analyzed using the full suite of spatial operations in WherobotsDB. This creates end-to-end raster workflows — from raw imagery through model inference to spatial analytics, without moving data between systems or building custom infrastructure.
RasterFlow supports both popular open-source geospatial AI models and custom PyTorch models, and can generate embeddings from geospatial foundation models. It is currently available to select customers in private preview. If you’re interested in RasterFlow, join our upcoming session to see it in action.
Raster processing is not only a first-class capability in WherobotsDB, but also it’s one part of a broader set of spatial data processing advances we’ve built beyond open-source Sedona. Vector and raster workloads both benefit from the same query performance optimizations under the hood: spatial relationship acceleration, automatic join optimization, dynamic data redistribution, and a vectorized GeoParquet reader. Queries that require careful tuning with self-managed Sedona run optimally out-of-the-box with WherobotsDB.
In the next post in this series, we’ll go deep on query performance and spatial analytics, how WherobotsDB accelerates spatial joins, range queries, and analytical functions across both vector and raster data types.
Out-of-database raster architecture stores only metadata in the database while fetching pixel data on demand from remote storage. Instead of loading entire raster files into memory, only the pixels required for a specific operation are materialized. This enables teams to process terabyte-scale imagery collections — including statewide mosaics, multi-year satellite archives, and continental elevation models — without the memory overhead of traditional in-memory approaches.
WherobotsDB extends open-source Apache Sedona with additional capabilities and performance optimizations purpose-built for large-scale physical-world data processing. While both support raster operations like zonal statistics, WherobotsDB’s out-of-database architecture materializes only the pixels within each zone rather than processing the entire raster — making it practical to run zonal statistics across millions of zones on terabyte-scale imagery. Existing Apache Sedona workloads run on WherobotsDB without code changes.
WherobotsDB supports raster-to-vector conversion, multi-band tile processing, zonal statistics, custom raster algebra, and spatial filter push-down. These capabilities cover both transforming raster data for hybrid workflows and analyzing raster data in place. All functions are built on the out-of-database architecture, inheriting lazy loading, selective pixel materialization, and intelligent caching without additional configuration.
RasterFlow is an inference and perception engine for large-scale image processing built into Wherobots Cloud. It enables teams to run computer vision models against large-scale raster datasets, handling imagery preparation, mosaicking, edge effect removal, distributed model inference, and conversion of predictions into vector geometries. RasterFlow supports open-source geospatial AI models and custom PyTorch models, and can generate embeddings from geospatial foundation models. It is currently available to select customers in private preview.
Yes. WherobotsDB maintains full API compatibility with open-source Apache Sedona. Existing Sedona workloads run without code changes.
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
PostGIS, Wherobots, and the Spatial Data Lakehouse: A Strategic Guide for Leaders
Explore PostGIS, Wherobots, and the Spatial Data Lakehouse. Learn when to use each for scalable geospatial analytics, AI, and cost-efficient data strategy.
It takes 15 minutes for the Caltrain to get from Sunnyvale to SAP Center
That’s how long it took our MCP server to go from “how many bus stops are in Maryland” to an answer
Wherobots and Felt Partner to Modernize Spatial Intelligence
We’re excited to announce Wherobots and Felt are partnering to enable data teams to innovate with physical world data and move beyond legacy GIS, using the modern spatial intelligence stack. The stack with Wherobots and Felt provides a cloud-native, spatial processing and collaborative mapping solution that accelerates innovation and time-to-insight across an organization. What is […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: