4 Mins Read

24 Sep 2025

Introducing SedonaDB and SpatialBench for Apache Sedona

Authors

Our role at Wherobots and as leaders in the Apache Sedona community is to help more developers, organizations, and AI systems positively transform the physical world using spatial data. In order to make the scale of transformation we envision possible, we’ve had to address significant bottlenecks in how data is stored and queried.

We’re excited to celebrate the availability of SedonaDB and SpatialBench for Apache Sedona. Together they represent the next phase in our plan to accelerate innovation with spatial data and bridge the intelligence gap between AI and the physical world.

Intro to SedonaDB: A modern query engine that gets spatial right

SedonaDB is the first open-source, single-node analytical database engine that treats spatial data as a first-class citizen.

Most analytical query engines already support general-purpose operations: filtering, joins, aggregations, and APIs for SQL or Python. But when it comes to operating on spatial data those same engines fall short: support for geometry and geography types, coordinate reference systems (CRS), spatial joins, and raster or vector operations is missing. The workaround is to bolt on an extension like PostGIS (PostgreSQL), DuckDB Spatial (DuckDB), or SedonaSpark (Spark). While powerful, extensions inherit the limits, costs, and complexities of their host systems, require extra setup and tuning, and can force builders to develop around performance and usability gaps instead of developing their ideas.

SedonaDB is different. It’s for builders solving problems with physical world data.

Written in Rust, it’s lightweight, blazing fast, and spatial-native. Out of the box, it provides:

Full support for spatial types, joins, CRS, and functions on top of industry standard query operations.
Query optimizations, indexing, and data pruning features under the hood that make spatial operations just work with high performance.
Pythonic and SQL interfaces familiar to developers, plus APIs for R and Rust.
Flexibility to run in single-machine environments on local files or data lakes.

SedonaDB uses Apache Arrow and Apache DataFusion, and provides everything you need from a modern vectorized query engine. But it delivers the unique ability to also run high performance spatial workloads easily, without requiring extensions.

Read the announcement on the Apache Sedona blog to dive in and roll up your sleeves.

What led to SedonaDB?

In 2020, Apache Sedona was incubated to address a significant support gap in distributed geospatial data processing. Since then, Sedona has enabled companies like Uber, Amazon Last Mile Delivery, JB Hunt, and thousands of others with geographically distributed operations or interests to build and run more efficient and effective physical operations at scale. It is widely used today to bring geospatial processing support to Apache Spark, Apache Flink, and also Snowflake. But distributed systems aren’t for everyone or the right fit for every use case, and we could do more to drive innovation in lower-scale scenarios.

Accelerating innovation

Many ideas are bootstrapped in no-to-low cost environments where iteration cycles are fast and low risk. There’s a lot that a developer can do today using a laptop or a single virtual machine, with modern software and LLMs—without adding a dependency that adds unwanted cost and complexity to the innovation cycle. Once their ideas are viable, they may not even require a distributed compute environment in production like Spark, or one that is “fully managed” by a vendor.

So the next step was pretty clear. We had to make it easier for builders to use spatial data in no-to-low cost environments so they can iterate and positively transform the physical world, faster. We also decided to address these challenges through open-source software to maximize accessibility.

Making development easier

If you look around the ecosystem, you’ll notice a pattern: to get the analytical support you need for geospatial data, you deploy an analytics engine without the spatial analytics support you want, and then you bolt on what you need via an extension.

Extensions are great and they serve a purpose very well. After all, SedonaSpark is an extension! But that doesn’t mean the combination of engine + extension is ideal. It requires additional setup and management, can require tuning to achieve a reasonable performance, and the underlying engine may end up becoming a bottleneck. Additionally, the development experience around the engine may be overly complex or lack support for the language you prefer, and the engine itself might introduce compute, cost, and other overhead.

Working from the root causes of these challenges, along with the desire to drive more innovation, our next step became obvious. We needed to create a query engine that aids spatial data solutions development out of the box with popular pythonic and SQL interfaces and is optimized for single-machine environments.

But was there enough value created by a spatial-first query engine compared to general purpose query engines with spatial extensions?

Optimizing for spatial data = a better future

Spatial data is no longer a minor class of data. It’s everywhere, the rate at which it’s being generated is growing every day, and its use cases span numerous industries. It streams from devices, vehicles, satellites, and drones, and derivatives from this data inform automation and decision-making across business, government, and research. The solutions being developed with it are transforming how organizations operate in the physical world.

Innovation is happening today with this data despite the friction above, but the pace of this innovation can be accelerated by a query engine with internals intentionally designed to help developers realize the full potential of this data.

This engine is SedonaDB, and it’s backed by an open-source community (Apache Sedona) that is committed to solving physical-world challenges through data and technology.

Intro to SpatialBench: The first standard for spatial query performance

“Without standards, there can be no improvement” – Taiichi Ohno. This statement from the founder of the Toyota Production System is an analogy for why we built SpatialBench. There was no standard way of measuring spatial query performance, so progress couldn’t be easily quantified or query engines objectively compared on this dimension.

We built SpatialBench to establish first standards. The initial release supports 12 representative queries, ranging from simple to complex workloads, and includes a data generator for scale factors 1, 10, 100, and 1000. We hope this framework and its future versions will guide innovation that leads to a greater understanding of the physical world.

We also used SpatialBench to benchmark SedonaDB, DuckDB (with its spatial extension), and GeoPandas at scale factors 1 and 10. Those results are published here.

Next Steps

Get Started with SedonaDB: Try it out and contribute to the roadmap.
Use SpatialBench: Measure spatial query performance using a consistent standard.
Watch the webinar (hosted by CNG): We’ve walked through SedonaDB and SpatialBench, and introduced Wherobots’ Startup Accelerator Program.