Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Apache Sedona is a cluster computing system for processing large-scale spatial data. It treats spatial data as a first class citizen by extending the functionality of distributed compute frameworks like Apache Spark, Apache Flink, and Snowflake. Apache Sedona was created at Arizona State University under the name Geospark, in the paper “Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond.”
Apache Sedona introduces data types, operations, and indexing techniques optimized for spatial workloads on top of Apache Spark and other distributed compute frameworks. Let’s take a look at the workflow for analyzing spatial data with Apache Sedona.
The first step in spatial query processing is to ingest geospatial data into Apache Sedona. Data can be loaded from various sources such as files (Shapefiles, GeoJSON, Parquet, GeoTiff, CSV, etc) or databases intro Apache Sedona’s in-memory distributed spatial data structures (typically the Spatial DataFrame).
Next, Sedona makes use of spatial indexing techniques to accelerate query processing, such as R-trees or Quad trees. The spatial index is used to partition the data into smaller, manageable units, enabling efficient data retrieval during query processing.
Once the data is loaded and indexed spatial queries can be executed using Sedona’s query execution engine. Sedona supports a wide range of spatial operations, such as spatial joins, distance calculations, and spatial aggregations.
Sedona optimizes spatial queries to improve performance. The query optimizer determines an efficient query plan by considering the spatial predicates, available indexes, and the distribution of data across the cluster.
Spatial queries are executed in a distributed manner using Sedona’s computational capabilities. The query execution engine distributes the query workload across the cluster, with each node processing a portion of the data. Intermediate results are combined to produce the final result set. Since spatial objects can be very complex with many coordinates and topology, Sedona implements a custom serializer for efficiently moving spatial data throughout the cluster.
So what exactly are users doing with Apache Sedona? Here are some common examples of what users are doing with Apache Sedona:
Many of these use case can be described as geospatial ETL operations. ETL (extract, transform, load) is a data integration process that involves retrieving data from various sources, transforming and combining these datasets, then loading the transformed data into a target system or format for reporting or further analysis. Geospatial ETL shares many of the same challenges and requirements of traditional ETL processes with the additional complexities of managing the geospatial component of the data, working with geospatial data sources and formats, spatial data types and transformations, as well as the scalability and performance considerations required for spatial operations such as joins based on spatial relationships. For a more complete overview of use cases with Apache Sedona, you can read our ebook on it here.
Apache Sedona has gained significant community adoption and has become a popular geospatial analytics library within the distributed computing and big data ecosystem. As an Apache Software Foundation (ASF) incubator project, Sedona’s governance, licensing, and community participation align with ASF principles.
Sedona has an active and growing developer community, with contributors from a number of different types of organizations and over 100 individuals interested in advancing the state of geospatial analytics and distributed computing. Sedona has reach over 38 million downloads with a rate of 2 million downloads per month with usage growing at a rate of 200% per year (as of the date this was published).
Organizations in industries including transportation, urban planning, environmental monitoring, logistics, insurance and risk analysis and more have adopted Apache Sedona. These organizations leverage Sedona’s capabilities to perform large-scale geospatial analysis, extract insights from geospatial data and build geospatial analytical applications at scale. The industry adoption of Apache Sedona showcases its practical relevance and real-world utility.
Apache Sedona has been featured in conferences, workshops, and research publications related to geospatial analytics, distributed computing, and big data processing. These presentations and publications contribute to the awareness, visibility, and adoption both within the enterprise and within the research and academic communities.
As you get started with Apache Sedona the following resources will be useful throughout your journey in the world of large-scale geospatial data analytics.
The best place to start learning about Apache Sedona is the authoritative book on the topic, which was recently published in early release format “Cloud Native Geospatial Analytics with Apache Sedona”. The team behind the project will continue to release chapters until the book is complete over the coming months.
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Wherobots and Felt Partner to Modernize Spatial Intelligence
We’re excited to announce Wherobots and Felt are partnering to enable data teams to innovate with physical world data and move beyond legacy GIS, using the modern spatial intelligence stack. The stack with Wherobots and Felt provides a cloud-native, spatial processing and collaborative mapping solution that accelerates innovation and time-to-insight across an organization. What is […]
Scaling Spatial Analysis: How KNN Solves the Spatial Density Problem for Large-Scale Proximity Analysis
How we processed 44 million geometries across 5 US states by solving the spatial density problem that breaks traditional spatial proximity analysis
Wherobots brought modern infrastructure to spatial data in 2025
We’re bridging the gap between AI and data from the physical world in 2026
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: