WherobotsDB is now 3x faster with up to 45% better price performance Learn why

What is Apache Sedona?

Authors

Apache Sedona - Simplified diagram

Last Updated: February 2026

Apache Sedona is an open-source cluster computing system built for processing large-scale spatial data across distributed environments. Originally developed at Arizona State University under the name GeoSpark, in the paper “Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond” by Jia Yu and Mohamed Sarwat, it is now a top-level Apache Software Foundation project used by organizations across transportation, logistics, environmental monitoring, insurance, and urban planning. This page covers what Apache Sedona is, how it processes spatial queries, common use cases, and how to get started.

Key Takeaways

  • Apache Sedona is an open-source cluster computing system for processing large-scale spatial data
  • It extends distributed compute frameworks including Apache Spark, Apache Flink, and Snowflake
  • Originally created at Arizona State University under the name GeoSpark Supports spatial operations including spatial joins, distance calculations, and spatial aggregations
  • Ingests data from multiple formats including Shapefiles, GeoJSON, GeoTiff, Parquet, and CSV
  • Used across industries including transportation, urban planning, environmental monitoring, logistics, and insurance
  • Has surpassed 38 million downloads as of October 2024

What is Apache Sedona and How Does It Work

Apache Sedona treats spatial data as a first-class citizen by extending distributed compute frameworks including Apache Spark, Apache Flink, and Snowflake with specialized data types, spatial operations, and indexing techniques optimized for spatial workloads. Unlike general-purpose compute frameworks, Sedona is purpose-built for the unique challenges of spatial data, including complex geometries, coordinate systems, and spatial relationships that standard data types cannot handle efficiently. The following section outlines how Apache Sedona processes spatial queries from data ingestion through to distributed execution.

Apache Sedona Architecture

What Programming Languages Does Apache Sedona Support

Apache Sedona supports multiple programming languages, including Python, Scala, Java, R, and SQL, making it accessible to a wide range of data engineering and analytics workflows. Developers can interact with Apache Sedona through whichever language fits their existing stack.

On the integrations side, Apache Sedona runs on Apache Spark, Apache Flink, and Snowflake. Each runtime serves a different need: Apache Spark for large-scale distributed batch processing, Apache Flink for real-time streaming spatial analytics, and Snowflake for teams running spatial workloads inside a cloud data warehouse environment.

How Does Apache Sedona Process Spatial Queries

The first step in spatial query processing is to ingest geospatial data into Apache Sedona. Data can be loaded from various sources such as files (Shapefiles, GeoJSON, Parquet, GeoTiff, CSV, etc) or databases into Apache Sedona’s in-memory distributed spatial data structures (typically the Spatial DataFrame).

Next, Sedona makes use of spatial indexing techniques to accelerate query processing, such as R-trees or Quad trees. The spatial index is used to partition the data into smaller, manageable units, enabling efficient data retrieval during query processing.

Once the data is loaded and indexed spatial queries can be executed using Sedona’s query execution engine. Sedona supports a wide range of spatial operations, such as spatial joins, distance calculations, and spatial aggregations.

Sedona optimizes spatial queries to improve performance. The query optimizer determines an efficient query plan by considering the spatial predicates, available indexes, and the distribution of data across the cluster.

Spatial queries are executed in a distributed manner using Sedona’s computational capabilities. The query execution engine distributes the query workload across the cluster, with each node processing a portion of the data. Intermediate results are combined to produce the final result set. Since spatial objects can be very complex with many coordinates and topology, Sedona implements a custom serializer for efficiently moving spatial data throughout the cluster.

What Are Common Apache Sedona Use Cases

Organizations use Apache Sedona for a range of large-scale geospatial data processing tasks, including

  • Creating custom weather, climate, and environmental quality assessment reports at national scale by combining vector parcel data with environmental raster data products.
  • Generating planetary scale GeoParquet files for public dissemination via cloud storage by combining, cleaning, and indexing multiple datasets.
  • Converting billions of daily point telemetry observations into routes traveled by vehicles.
  • Enriching parcel level data with demographic and environmental data at the national level to feed into a real estate investment suitability analysis.

Many of these use cases can be described as geospatial ETL operations. ETL (extract, transform, load) is a data integration process that involves retrieving data from various sources, transforming and combining these datasets, then loading the transformed data into a target system or format for reporting or further analysis. Geospatial ETL shares many of the same challenges and requirements of traditional ETL processes with the additional complexities of managing the geospatial component of the data, working with geospatial data sources and formats, spatial data types and transformations, as well as the scalability and performance considerations required for spatial operations such as joins based on spatial relationships.

For a real-world example of Apache Sedona in production, watch how Comcast data engineer David Buchanan used Apache Sedona to optimize geospatial ETL pipelines at scale, reducing processing time from 5 hours to 30 minutes:

“Apache Sedona has been a great asset to our team at Comcast. We adopted Apache Sedona to add geospatial capabilities to our existing Spark extract, transform, and load (ETL) pipelines. Its open-source nature enables us to be flexible in creating pipelines that are usable both in the cloud and on-premises, reducing vendor lock-in. Additionally, we can quickly scale our workloads depending on the team's needs, which has increased our productivity compared to using other tools like GeoPandas or PostGIS.”
David Buchanan

Comcast

How Widely is Apache Sedona Used

Apache Sedona is one of the most widely adopted geospatial analytics libraries in the distributed computing ecosystem, with over 38 million downloads and active use across industries including transportation, logistics, environmental monitoring, and insurance. As a top-level Apache Software Foundation (ASF) project since February 2023, Sedona’s governance, licensing, and community participation align with ASF principles.

Sedona has an active and growing developer community, with contributors from a number of different types of organizations and over 100 individuals interested in advancing the state of geospatial analytics and distributed computing. As of October 2024, Apache Sedona had surpassed 38 million downloads, with approximately 2 million downloads per month and year-over-year usage growth of 200%.

Organizations in industries including transportation, urban planning, environmental monitoring, logistics, insurance and risk analysis and more have adopted Apache Sedona. These organizations leverage Sedona’s capabilities to perform large-scale geospatial analysis, extract insights from geospatial data and build geospatial analytical applications at scale.

Apache Sedona has been featured in conferences, workshops, and research publications related to geospatial analytics, distributed computing, and big data processing. For a deeper look at Apache Sedona’s adoption and real-world impact, watch Iceberg Geo Type: Transforming Geospatial Data Management at Scale presented by Jia Yu and Szehon Ho at at Data + AI Summit.

How to Get Started With Apache Sedona

For quick access to documentation and community support:

For a comprehensive introduction, ‘Cloud Native Geospatial Analytics with Apache Sedona,’ published with O’Reilly, covers how to work with large-scale spatial data using Apache Sedona, Apache Spark, and modern cloud technologies. It is aimed at developers, data scientists, and data engineers

Cloud-Native Geospatial Analytics with Apache Sedona Book Cover
Get Access to the Free Guide

Frequently Asked Questions About Apache Sedona

What is Apache Sedona?

Apache Sedona is a cluster computing system for processing large-scale spatial data. It extends the functionality of distributed compute frameworks including Apache Spark, Apache Flink, and Snowflake, treating spatial data as a first-class citizen with specialized data types, operations, and indexing techniques optimized for spatial workloads.

Is Apache Sedona open source?

Yes. Apache Sedona is an Apache Software Foundation (ASF) project. Its governance, licensing, and community participation align with ASF principles.

What programming languages does Apache Sedona support?

Apache Sedona supports Java, Python, R, Scala, and SQL.

What is Apache Sedona used for?

Apache Sedona is used for large-scale geospatial ETL operations and spatial data analysis. Specific use cases mentioned on the page include: creating weather, climate, and environmental quality reports at national scale by combining vector parcel data with raster data; generating planetary-scale GeoParquet files for public distribution via cloud storage; converting billions of daily point telemetry observations into vehicle routes; and enriching parcel-level data with demographic and environmental information for real estate investment analysis.

Who created Apache Sedona?

Apache Sedona was initiated as GeoSpark by Jia Yu and Mohamed “Mo” Sarwat at Arizona State University in 2010. In 2020, the project was submitted to the Apache Software Foundation, and in February 2023 it graduated as a top-level ASF project.

What file formats does Apache Sedona support?

Apache Sedona can ingest data from Shapefiles, GeoJSON, Parquet, GeoTiff, and CSV files.