Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Today we are announcing the next generation of WherobotsDB, the Apache Sedona and Spark 4 compatible engine, is now generally available. Compared to the the previous generation of WherobotsDB, this next gen (now the current) architecture and version accelerates queries by up to 3x, with up to 45% better price-performance. Previously in preview, it is now offered through the latest version of WherobotsDB.
How Customers Use WherobotsDB
Our customers are using WherobotsDB to create insights from spatial data at scale that result in improved products, services, and decision making in the physical world. They are realizing breakthroughs in fleet operations, improving their risk projections, increasing the accuracy of vegetative forecasts, analyzing change, and overall are more capable of innovating against physical world interests.
With Wherobots on AWS, not only can we easily scale to millions of acres and continuous tractor telemetry normalization within LeafLake, but also we can rest assured that our costs won’t spiral out of control. — G. Bailey Stockdale CEO, Leaf Agriculture
CEO, Leaf Agriculture
The workloads customers run or want to run, are becoming more ambitious too. Customers and AI alike demand better/faster/cheaper solutions for working a wide variety of raster and vector spatial datasets, and they need to fuse this data with valuable business context as well. And of course, they want to do it without scale and function limitations.
WherobotsDB was always designed from the ground up to meet these needs. But solutions are now even easier to realize, because the latest version of WherobotsDB allows you to do more, faster, at a lower cost.
The next generation of WherobotsDB delivers a substantial performance increase for both spatial and non-spatial queries.
Compared to the previous engine, our benchmarking runs for of TPC-H and SpatialBench both show significant performance improvements across scale factors of 100 and 1000.
This shows the total cost of all SpatialBench queries at SF 1000 that the next best engine could finish under a timeout of 10 hours, which limited the comparison to Q1-Q5 and Q7. The remainder of the queries (Q6, Q8-Q12) could not be completed by that engine and were excluded from this analysis.
The current generation of WherobotsDB is more capable and 46% lower cost than the next best engine, which is a popular Spark based serverless engine with Spatial SQL support.
WherobotsDB is the only engine capable of meeting the following spatial data requirements that customers have. You get:
✅ high performance, cost efficient, and scalable vector, raster, tabular data operations in a unified query environment ✅ compatibility with Spark 4 and Sedona ✅ interoperability with zero-copy on lakehouses and data lakes to keep data in your control ✅ unification with RasterFlow, to easily orchestrate planetary scale inference and analytics workflows starting with raw imagery datasets
The following matrix isolates the vector data processing capabilities of the next best alternatives to WherobotsDB using SpatialBench runs at a scale factor of 1000. Raster data capabilities were not compared, because WherobotsDB was the only engine in the set that supports raster data capabilities.
Contact us and we can share additional details, or rerun these benchmarks on an engine of your choice.
The original architecture for WherobotsDB was built on a JVM-based execution model. It is an extraordinary platform for distributed computing, but its row-oriented execution model and JVM memory management introduce overhead that compounds at scale, especially for spatial-heavy workloads where every row carries complex spatial objects that must be serialized, deserialized, and processed one at a time.
The new WherobotsDB version is built on a new architecture that addresses these bottlenecks by replacing the JVM-based execution layer with a Rust-native, Arrow-columnar engine, optimized for spatial data executions.
The latest version of WherobotsDB takes advantage of SedonaDB, an open-source, blazing-fast analytical database engine where geospatial data is the first-class citizen. The use of Rust and SedonaDB allowed us to move spatial logic out of the JVM and directly onto the native execution layer. It provides a unified execution model that supports everything from scalar and window functions to complex spatial joins, aggregations, and geometry operations.
Apache DataFusion is a Rust-native query engine built from the ground up on the Apache Arrow in-memory columnar format. By integrating DataFusion’s native execution with WherobotsDB’s distributed engine, you get the best of both worlds: WherobotsDB’ battle-tested scheduling and DataFusion’s high-performance native processing. This means your existing WherobotsDB-based workflows, SQL queries, and Python notebooks continue to work exactly as before, but the actual computation accelerates from optimized native code rather than in the JVM.
To remove the significant overhead of data conversion, we implemented a high-performance native geometry type based on the GeoArrow specification. This allows for “Zero-Copy” data handling, utilizing Arrow’s nested memory layout to represent geometries and geographies without the costly serialization and deserialization steps typically found in spatial databases.
WherobotsDB is now compatible with the latest features of Spark 4 to provide a modern, robust environment that enforces ANSI SQL by default. This upgrade integrates the Wherobots engine with the newest advancements in distributed computing, including improved query planning and execution protocols.
Get started now using a 30 day, $300 free trial available for the Professional Edition of Wherobots.
The latest version of WherobotsDB is generally available today and is the default version for all new runtimes on Wherobots Cloud.
Due to the enforcement of ANSI SQL, if you’re an existing customer we recommend testing your workloads on the latest version of WherobotsDB in a notebook, SQL session, or job run. For most workloads, the upgrade should be seamless.
If you have questions about the upgrade, experience unexpected behavior, want a custom benchmark, or would like to discuss how Wherobots can benefit your business, reach out to the Wherobots team at support@wherobots.com, sales@wherobots.com, or by filling out our contact us form.
WherobotsDB is built on Apache Sedona (we’re the same creators!) and extends it rather than competing with it directly. Under the hood, WherobotsDB uses SedonaDB, an open-source analytical database engine where geospatial data is a first-class citizen, as part of its native execution layer.
On SpatialBench at a scale factor of 1000, WherobotsDB completed all queries including heavy spatial join, heavy distance join, heavy polygon self spatial join, heavy spatial left join, multi-way spatial join, and KNN join, which the next best Spark-based engine with Spatial SQL support could not finish within a 10-hour timeout. WherobotsDB also comes in 46% lower cost than that next best alternative.
The current generation of WherobotsDB replaces the previous JVM-based execution layer with a Rust-native, Arrow-columnar engine optimized for spatial data. It integrates Apache DataFusion, a Rust-native query engine built on the Apache Arrow in-memory columnar format, for high-performance native processing. Spatial logic runs through SedonaDB directly on the native execution layer rather than inside the JVM. To eliminate serialization overhead, WherobotsDB implements a native geometry type based on the GeoArrow specification, enabling zero-copy data handling. It is also fully compatible with Apache Sedona and Spark 4.
The latest version of WherobotsDB is generally available today and is the default version for all new runtimes on Wherobots Cloud. New users can get started with a 30-day, $300 free trial for the Professional edition. If you want to discuss a custom benchmark, have questions about upgrading, or want to explore how WherobotsDB fits your use case, you can reach out to us through the contact us form.
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Raster Processing at Scale: The Out-of-Database Architecture Behind WherobotsDB
Learn how WherobotsDB's out-of-database architecture processes terabyte-scale satellite imagery, elevation models, and sensor data at scale, enabling zonal statistics, raster algebra, and planetary-scale AI inference without custom infrastructure.
PostGIS, Wherobots, and the Spatial Data Lakehouse: A Strategic Guide for Leaders
Explore PostGIS, Wherobots, and the Spatial Data Lakehouse. Learn when to use each for scalable geospatial analytics, AI, and cost-efficient data strategy.
It takes 15 minutes for the Caltrain to get from Sunnyvale to SAP Center
That’s how long it took our MCP server to go from “how many bus stops are in Maryland” to an answer
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: