Analyzing The Overture Maps Places Dataset Using SedonaDB, Wherobots Cloud, & GeoParquet


Overture Maps, supported by the Overture Maps Foundation (OMF), offers a comprehensive geospatial data set, now in GeoParquet format, categorized into themes like places of interest, buildings, transportation networks, and administrative boundaries. GeoParquet, a geospatially optimized variant of the standard Parquet format, enhances the management of spatial data, making it particularly well-suited for geospatial analytics. Unlike traditional Parquet, GeoParquet is specifically designed to efficiently store and handle spatial information, which includes the addition of spatial indexing and optimized storage of geometry data.

This article aims to showcase the practical applications and benefits of Overture Maps data available in the Wherobots Open Data Catalog. By delving into real-world use cases, we demonstrate how the Overture Maps dataset enables deeper and faster insights into urban dynamics and broadens the scope for advanced geospatial analysis.

To follow along, first create a free account in Wherobots Cloud.

Data Schema for Places Theme in Overture Maps

The Places theme in Overture Maps represents point locations of various facilities, services, or amenities. Key schema design choices include:

  • Extensible Attributes: Basic common attributes such as phone, mail, website, and brand are included. Additional attributes not currently in the official release are allowed with an "ext" prefix. Attributes specific to certain types of places are planned for future inclusion.
  • Controlled Categories: A hierarchical categorization system (taxonomy) allows for the transformation of various categorization systems to the Overture framework. This taxonomy is intended to be comprehensive and will be fine-tuned over time.
Schema Representation
|-- id: string (nullable = true)
|-- updatetime: string (nullable = true)
|-- version: integer (nullable = true)
|-- names: map (nullable = true)
|    |-- key: string
|    |-- value: array (valueContainsNull = true)
|    |    |-- element: map (containsNull = true)
|    |    |    |-- key: string
|    |    |    |-- value: string (valueContainsNull = true)
|-- categories: struct (nullable = true)
|    |-- main: string (nullable = true)
|    |-- alternate: array (nullable = true)
|    |    |-- element: string (containsNull = true)
|-- confidence: double (nullable = true)
|-- websites: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- socials: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- emails: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- phones: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- brand: struct (nullable = true)
|    |-- names: map (nullable = true)
|    |    |-- key: string
|    |    |-- value: array (valueContainsNull = true)
|    |    |    |-- element: map (containsNull = true)
|    |    |    |    |-- key: string
|    |    |    |    |-- value: string (valueContainsNull = true)
|    |-- wikidata: string (nullable = true)
|-- addresses: array (nullable = true)
|    |-- element: map (containsNull = true)
|    |    |-- key: string
|    |    |-- value: string (valueContainsNull = true)
|-- sources: array (nullable = true)
|    |-- element: map (containsNull = true)
|    |    |-- key: string
|    |    |-- value: string (valueContainsNull = true)
|-- bbox: struct (nullable = true)
|    |-- minx: double (nullable = true)
|    |-- maxx: double (nullable = true)
|    |-- miny: double (nullable = true)
|    |-- maxy: double (nullable = true)
|-- geometry: geometry (nullable = true)
|-- geohash: string (nullable = true)

Accessing Overture Maps Places Dataset

To analyze the data from Overture Maps, we first create and connect our SedonaContext to the Wherobots Open Data Catalog like so,

from sedona.spark import *

config = SedonaContext.builder(). \
config("spark.sql.catalog.wherobots_examples.type", "hadoop"). \
config("spark.sql.catalog.wherobots_examples", "org.apache.iceberg.spark.SparkCatalog"). \
config("spark.sql.catalog.wherobots_examples.warehouse", "s3://wherobots-examples-prod/havasu/warehouse"). \
config("", ""). \

sedona = SedonaContext.create(config)

Next, we access the Places theme dataset of Overture Maps via,

places_df = sedona.table("wherobots_examples.overture.places_place")

Spatial Filtering for NYC Metropolitan Area

For all use cases in this article, we focus on the New York City (NYC) metropolitan area. We apply spatial filtering to limit our dataset to this specific area, using the bounding box coordinates of New York City.

spatial_filter = "ST_Within(geometry, ST_PolygonFromEnvelope(-74.25909, 40.477399, -73.700181, 40.917577))"
places_df = places_df.where(spatial_filter)

To illustrate the comprehensive coverage of the dataset, the following map showcases 278,998 points of interest just in the New York City area.

Visualizing places from the Overture Maps dataset

Elevate key nested fields to top level columns

To facilitate easier aggregation and analysis, it’s important to transform certain nested fields into top-level columns. In our dataset, we focus on the ‘main’ and ‘alternate’ subcategories within the ‘categories’ column of the places dataset.

First, we create a new column ‘category’ that directly holds the values from ‘categories.main’:

places_df = places_df.withColumn("category", col("categories.main")) 

Next, we use the explode function to transform the ‘alternate’ subcategories. The explode function is used to expand an array or map column into multiple rows. When applied to the ‘categories.alternate’ array, each element in the array is turned into a separate row, effectively creating a new row for each alternate category associated with the same place.

places_df_exploded = places_df.withColumn("alternate_category", explode("categories.alternate")) 

Here’s what the explode transformation looks like:

Before applying transformation:

|id                  |categories                                  |
|tmp_F36B3571B3E58...|{hvac_services, [industrial_equipment]}     |
|tmp_240555DC4354D...|{elementary_school, [school, public_school]}|

After applying transformation:

|id                                  |category         |alternate_category  |
|tmp_F36B3571B3E583C482BD02CAC65657B6|hvac_services    |industrial_equipment|
|tmp_240555DC4354D0975F72960E276D481C|elementary_school|school              |
|tmp_240555DC4354D0975F72960E276D481C|elementary_school|public_school       |

Explore categories

Group the data by ‘category’ and count the occurrences to understand the distribution of categories. After GroupBy, the categories are ranked based on number of occurrences. This tells us about the most common business categories in NYC.

categories_df = places_df.groupBy("category").agg(count("*").alias("count"))
categories_df = categories_df.orderBy("count", ascending=False)
windowSpec = Window.orderBy(col("count").desc())
categories_df = categories_df.withColumn("overall_rank", rank().over(windowSpec)), truncate=False)

This gives us the following output:

|category                        |count|overall_rank|
|beauty_salon                    |10919|1           |
|community_services_non_profits  |5963 |2           |
|church_cathedral                |5507 |3           |
|professional_services           |4675 |4           |
|landmark_and_historical_building|4436 |5           |
|hospital                        |4035 |6           |
|dentist                         |3538 |7           |
|real_estate                     |3330 |8           |
|park                            |3171 |9           |
|school                          |3016 |10          |

Explore Coffee Shops

Lets explore the coffee shops category a bit more.

coffee_df = places_df.filter(places_df_exploded.category == "coffee_shop")
coffee_alt_cats = places_df_exploded.filter(places_df_exploded.category == "coffee_shop").groupBy("alternate_category").agg(count("*").alias("count"))
coffee_alt_cats = coffee_alt_cats.orderBy("count", ascending = False), truncate=False)

We group the coffee shop data by ‘alternate_category’ and count the occurrences to understand the distribution of coffee shop types.
After grouping by ‘alternate_category’, the data is aggregated to count the occurrences and then ordered to show the most common alternate categories within coffee shops.

The Bar chart below shows the relative frequency of each alternate category as a percentage of total coffee shops.

Visualizing alternate categories

Now, lets filter the coffee_df with ‘bagel_shop’ as alternate_category because you may want to grab coffee and bagels on your way to work without having to stand in line at both a coffee shop and a bagel shop.

coffee_bagel_df = coffee_df.filter(array_contains(coffee_df.categories.alternate,"bagel_shop"))
coffee_bagel_df =, coffee_bagel_df.names, coffee_bagel_df.geometry)
coffee_bagel_df = coffee_bagel_df.withColumn("name", col("names.common")[0]["value"]).drop("names")

Visualizing coffee shops that make bagels in NYC using SedonaKepler:

Visualizing points of interest from Overture Maps

Exploring Stadiums Arena category

Let’s imagine we want to analyze places where we might see a show or sports event, such as stadiums and arenas, and understand the types of businesses located within walking distance. This analysis can provide insights into the commercial ecosystem surrounding entertainment venues and help us understand the urban dynamics in these areas.

We begin by filtering out the category and then creating temporary views for places and arenas. In PySpark, in order to execute SQL commands on a DataFrame, you need to register it as a temporary view or table first.

arena_df = places_df.filter(places_df.category == "stadium_arena")

To Identify proximal businesses to Stadium Arenas, we perform a spatial intersection.

The following SQL query performs a spatial intersection to find businesses within a 0.002 unit distance (about 1 block) from Stadium Arenas . It uses ST_Intersects for spatial relation checks, combined with ST_Buffer to expand the Arena geometries by 0.02 units, creating a search area. The value 0.02 units is assumed to be the walkable distance of 1 block.

arena_places = sedona.sql('''
    SELECT AS places_id,
        Places.geometry AS places_geometry,
        Places.category AS places_category, AS arena_id,
        Arenas.geometry AS arena_geometry,
        Arenas.names.common[0].value AS arena_name
        Places, Arenas
        ST_Intersects(Places.geometry, ST_Buffer(Arenas.geometry, 0.002))

To get a better picture, here’s a rendered map of the arena_places DataFrame using SedonaKepler. The red dots are the Stadium Arenas while the blue dots are the businesses in the arena’s vicinity.

Visualizing Overture Maps points of interest

Next, we group the proximal businesses by ‘category’ and count the occurrences to understand the distribution of proximal businesses. After the GroupBy operation, the categories are ranked based on number of occurrences. This tells us about the most common business categories in proximity to Stadium Arenas.

arena_places_count = arena_places.groupBy("places_category").agg(countDistinct("places_id").alias("count"))
arena_places_count = arena_places_count.orderBy("count", ascending=False)
windowSpec = Window.orderBy(col("count").desc())
arena_places_count = arena_places_count.withColumn("arena_rank", rank().over(windowSpec)), truncate=False)

To highlight which types of businesses are more commonly found in the vicinity of Stadium Arenas, we compare the frequency of various business categories overall versus those near Stadium Arenas. This is achieved by,

  • Joining the categories_count and arena_places_count DataFrames
  • Calculating the rank differences
  • Ordering the result by rank difference
        cc.count AS overall_count,
        apc.count AS arena_count,
        cc.overall_rank - apc.arena_rank AS rank_difference
    FROM categories_count AS cc
    LEFT JOIN arena_places_count AS apc
        ON cc.category = apc.places_category
    WHERE apc.arena_rank <= 50
    ORDER BY rank_difference desc nulls last
''').show(12, truncate=False)

We get the table below,

|category                         |overall_count|arena_count|overall_rank|arena_rank|rank_difference|
|advertising_agency               |628          |73         |95          |47        |48             |
|broadcasting_media_production    |873          |102        |66          |27        |39             |
|theatre                          |1046         |129        |56          |18        |38             |
|travel_services                  |710          |76         |80          |42        |38             |
|college_university               |1503         |188        |38          |10        |28             |
|jewelry_store                    |1661         |227        |33          |7         |26             |
|arts_and_entertainment           |1154         |107        |48          |23        |25             |
|counseling_and_mental_health     |1023         |82         |61          |39        |22             |
|topic_concert_venue              |1056         |89         |54          |33        |21             |
|hotel                            |1626         |151        |35          |16        |19             |
|event_planning                   |1082         |84         |52          |37        |15             |
|financial_service                |1892         |168        |26          |12        |14             |
  • High Concentration of Relevant Services: Categories like ‘advertising_agency’, ‘broadcasting_media_production’, and ‘theatre’ are much more common near stadium arenas than their overall city rankings, suggesting a synergy with sports and entertainment venues.
  • University-Affiliated Stadiums: The proximity of ‘college_university’ to stadium arenas might suggest that some of these stadiums are located within or near university campuses, serving as venues for college sports events, which are often significant in the United States.
  • Accommodation for Visitors: The high ranking of ‘hotel’ near stadium arenas indicates a demand for accommodation by visitors who may be traveling to NYC for games or events. This is consistent with the transient nature of sports and entertainment events, which often draw fans and participants from outside the local area.
  • Luxury and Leisure: ‘jewelry_store’ has a rank difference of 26, showcasing a demand for luxury shopping experiences around stadium arenas, potentially linked to the high-profile nature of events held at these locations.
  • Entertainment and Event Planning: Categories like ‘arts_and_entertainment’, ‘event_planning’, and ‘topic_concert_venue’ have higher rankings near arenas, reflecting the role of these venues as hubs for events and cultural activities.


The Overture Maps data in Wherobots open data catalog, offers great efficiency in spatial analytics. This synergy between advanced data formats and powerful analytics tools opens up new possibilities for geospatial analysis and insights. The analyses presented here are just the beginning and make several assumptions. However, with Wherobots and Overture Maps data, the possibilities for uncovering new insights and informing data-driven decisions are virtually limitless.

You can follow along with the code from this blog post by creating a free account in Wherobots Cloud.

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

Creating Collaborative Web Maps With The Felt API And Wherobots Cloud

An important requirement for data infrastructure tools like SedonaDB and Wherobots Cloud are that they integrate well with the technology ecosystems around them. In the world of spatial databases this includes geospatial visualization tooling. Being able to create maps with data from SedonaDB is an important usecase for Wherobots Cloud so in this blog post I wanted to explore how to create collaborative web maps with Felt, providing data using the Felt API.

For this map I wanted to integrate the Felt API with Wherobots Cloud so I could do some geospatial analysis using Spatial SQL and SedonaDB then publish the results of my analysis to Felt’s beautiful web-based mapping tooling.

I decided to use data from BirdBuddy, which publishes data about bird sightings at its smart birdfeeders to find the range of some of my favorite bird species.

North American bird ranges

You can follow along by creating a free account on Wherobots Cloud.

Collaborative Web Maps With Felt

Felt is a web-based tool for creating collaborative maps. Felt is bringing collaborative real-time editing functionality similar to what you’ve seen in Google Docs or Notion to the world of mapmaking. You can annotate, comment, and draw on the map, then share the results with anyone on the web you want with a single link to start collaborating.

Felt has also invested in supporting a wide range of data formats for adding data to maps with their Upload Anything tool, a simple drag and drop interface that supports formats include Shapefile, GeoJSON, GeoTiff, CSV, Jpeg, etc. Felt also built a QGIS plugin so if you’re used to working with desktop GIS tooling you can easily export your project and layers to Felt’s web-based tooling via QGIS.

Felt enables programmatically creating maps and layers as well as adding data to the map via the Felt API. We’ll be using the Felt API to upload the results of our analysis using SedonaDB to create and publish a map.

Wherobots Cloud File Management

Our data comes from Bird Buddy which makes a smart bird feeder than can identify bird species and (optionally) report their location.

BirdBuddy screenshot

Bird Buddy publishes its data as CSV files so we’ll download the latest data and then upload the file to our Wherobots Cloud instance via the "Files" tab. The free tier of Wherobots Cloud includes free data storage in AWS S3 which we can access within the Wherobots notebook environment using the S3 URL of the file.

Wherobots Cloud file management

Once you’ve uploaded a file you can click the copy file icon to copy the file’s S3 path to access the file in the Wherobots notebook environment. Note that these files are private to your Wherobots organization, so the S3 URL below won’t be accessible to anyone outside my organization.

S3_URL = "s3://<YOUR_S3_URL_HERE>/birdbuddy/"

Now we’ll load the BirdBuddy CSV data and convert it to a SedonaDB DataFrame so we can use Spatial SQL to find the range of each species.

bb_df ='csv').option('header','true').option('delimiter', ',').load(S3_URL)

Looking at the first few rows of the DataFrame we can see we have latitude and longitude stored as seperate fields, a well as information about the bird species.

|anonymized_latitude|anonymized_longitude| timestamp|      common_name| scientific_name|
|          45.441235|          -122.51253|2023-09...|  dark eyed junco|      junco h...|
|           41.75291|            -83.6242|2023-09...|northern cardinal|cardinalis ca...|
|            43.8762|            -78.9261|2023-09...|northern cardinal|cardinalis ca...|
|            33.7657|            -84.2951|2023-09...|northern cardinal|cardinalis ca...|
|            30.4805|            -84.2243|2023-09...|northern cardinal|cardinalis ca...|
only showing top 5 rows

Spatial SQL With SedonaDB

Now we’re ready to use the power of Spatial SQL to analyze our Bird Buddy data. We want to find the range of each species, but first let’s explore the data.

First we’ll convert our latitude and longitude fields into Point geometries using the ST_Point SQL function.

bb_df = bb_df.selectExpr('ST_Point(CAST(anonymized_longitude AS Decimal(24,20)), CAST(anonymized_latitude AS Decimal(24,20))) AS location', 'timestamp', 'common_name', 'scientific_name')

Now the location field is a proper geometry type that our Spatial DataFrame can take advantage of.

|            location|           timestamp|      common_name|     scientific_name|
|POINT (-122.51253...|2023-09-01 00:00:...|  dark eyed junco|      junco hyemalis|
|POINT (-83.6242 4...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
|POINT (-78.9261 4...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
|POINT (-84.2951 3...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
|POINT (-84.2243 3...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
only showing top 5 rows

We have just under 14 million bird observations in our DataFrame.


If we wanted to find all observations of Juncos in the data we can write a SQL query to filter the results and visualize the observations on a map using SedonaKepler, the SedonaDB integration for

junco_df = sedona.sql("SELECT * FROM bb WHERE common_name LIKE '%junco' ")

We used the SQL LIKE string comparision operator to find all observations relating to Juncos, then stored the results in a new DataFrame junco_df.

|            location|           timestamp|    common_name|scientific_name|
|POINT (-122.51253...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-94.5916 3...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-85.643 31...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-87.7645 3...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-122.16346...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
only showing top 5 rows

Now we’ll visualize the contents of our new junco_df DataFrame using SedonaKepler.

SedonaKepler.create_map(df=junco_df, name='Juncos')

Juncos observation map

Based on the map above it looks like Juncos have quite a large range throughout North America.

Next, we’ll filter the overall dataset to a few of my favorite bird species, then use the power of Spatial SQL with a GROUP BY operation to create convex hulls (polygon geometries) from the individual observations (point geometries) of each species.

By creating a convex hull around all point observations grouped by species we will create a new geometry that represents the observed range of each species in our dataset.

range_df = sedona.sql("""
    SELECT common_name, COUNT(*) AS num, ST_ConvexHull(ST_Union_aggr(location)) AS geometry 
    FROM bb 
    WHERE common_name IN ('california towhee', 'steller’s jay', 'mountain chickadee', 'eastern bluebird') 
    GROUP BY common_name 

Note our use of the following Spatial SQL functions:

  • ST_ConvexHull – given multiple point geometries, return a polygon geometry of an area that contains all points in a convex hull
  • ST_Union_aggr – an aggregating function that will collect multiple geometries, in this case used alongside a GROUP BY
|       common_name|  num|            geometry|
|  eastern bluebird|65971|POLYGON ((-80.345...|
|     steller’s jay|37864|POLYGON ((-110.26...|
| california towhee|22007|POLYGON ((-117.05...|
|mountain chickadee| 4102|POLYGON ((-110.99...|

Now we have a new DataFrame range_df with 4 rows, one for each of the species we indicated in the query above. But now the geometry field is a polygon that represents the observed range of that species in our dataset. Pretty neat – let’s visualize these species ranges using Felt.

The Felt API supports file uploads in a variety of formats, but we’ll use GeoJSON. We’ll convert our SedonaDB DataFrame into a GeoPandas GeoDataFrame and then export to a GeoJSON file so we can upload it to the Felt API.

range_gdf = geopandas.GeoDataFrame(range_df.toPandas(), geometry="geometry")
range_gdf.to_file('birdbuddy_range.geojson', driver='GeoJSON')

We’ve now created a GeoJSON file birdbuddy_range.geojson that looks a bit like this (we’ve omitted some lines):

    "type": "FeatureCollection",
    "features": [
            "type": "Feature",
            "properties": {
                "common_name": "eastern bluebird",
                "num": 65971
            "geometry": {
                "type": "Polygon",
                "coordinates": [

Felt Maps API

If you haven’t already, create a free Felt account and then in your account settings generate a new access token so you’ll be able to create maps and upload data via the Felt API.

Creating a Felt API token


To create a new map and upload data we’ll actually need to make a few network requests to the Felt API:

  1. /maps to create a new map. This endpoint will return the id and url of the new map.
  2. /maps/{map_id}/layers to create a new layer in our new map. Note we need to use the map_id from the previous request. This endpoint will return a presigned upload URL that will allow us to upload our GeoJSON file.
  3. /maps/{map_id}/layers/{layer_id}/finish_upload to indicate we have finished uploading our data using the presigned upload URL.

The function below will create a new map in Felt, then create a new layer and upload our GeoJSON file to this layer. See the Felt API docs for more examples of what’s possible with the Felt API.

def create_felt_map(access_token, filename, map_title, layer_name):

    # First create a new map using the /maps endpoint
    create_map_response =
            "authorization": f"Bearer {access_token}",
            "content-type": "application/json",
        json={"title": map_title},
    create_map_data = create_map_response.json()
    map_id = create_map_data['data']['id']
    map_url = create_map_data['data']['attributes']['url']

    # Next, we'll create a new layer and get a presigned upload url so we can upload our GeoJSON file
    layer_response =
        "authorization": f"Bearer {access_token}",
        "content-type": "application/json",
    json={"file_names": [filename], "name": layer_name},

    # This endpoint will return a pre-signed URL that we use to upload the file to Felt
    presigned_upload = layer_response.json()
    url = presigned_upload["data"]["attributes"]["url"]
    presigned_attributes = presigned_upload["data"]["attributes"]["presigned_attributes"]

    # A 204 response indicates that the upload was successful
    with open(filename, "rb") as file_obj:
        output =
            # Order is important, file should come at the end
            files={**presigned_attributes, "file": file_obj},
    layer_id = presigned_upload['data']['attributes']['layer_id']

    # Finally, we call the /maps/:map_id/layers/:layer_id/finish_upload endpoint to complete the process
    finish_upload =
            "authorization": f"Bearer {access_token}",
            "content-type": "application/json"},
            json={"filename": filename, "name": layer_name},

Now to create a new Felt map we can call this function, passing our API token, the name of our GeoJSON file as well as what we’d like to call our new map and the data layer.

create_felt_map(FELT_TOKEN, "birdbuddy_range.geojson", "North American Bird Ranges", "My Favorite Birds")

We’ve now created a new map in Felt and uploaded our GeoJSON data as a new layer. We can share the URL with anyone on the web to view or collaborate on our map!

North American bird ranges

We can also embed the map in our Jupyter notebook:

from IPython.display import HTML
HTML('<iframe width="1600" height="600" frameborder="0" title="My Favorite Bird Ranges" src=""></iframe>"')

The 30 Day Map Challenge

Every year the geospatial data and map community joins together to organize the "30 Day Map Challenge" a fun and informal challenge to create a new map and share it on social media each day for one month.

The 30 Day Map Challenge

This BirdBuddy map was my 30 Day Map Challenge map for Day 3: Polygons. You can find the full Jupyter Notebook with all code on GitHub here as well as some of my other 30 Day Map Challenge maps in this repository. If you’d like to follow along with my attempt at the rest of the 30 Day Map Challenge feel free to connect with me on Twitter or LinkedIn.

Want to keep up to date with Wherobots? Sign up for the Wherobots Developer Newsletter below:

SedonaDB: The Cloud-native Spatial Analytics Database Platform

According to Gartner, 97% of data collected at the enterprise sits on the shelves without being put into use. That is a shockingly big number, especially given that the data industry got their hopes up a few years back when the Economist published their article “The most valuable resource is no longer oil, it’s data”. That is also quite surprising given the 100s of billions of dollars invested in database and analytics platforms over the past two decades.

One main reason is that data professionals most of the time struggle to connect data to use cases. A natural way for data professionals to achieve that is to link their data/insights to data about the physical world, aka.“Spatial Data”, and hence ask physical-world related questions on such data, aka. "Spatial Analytics". This spatial approach can be an indispensable asset for businesses worldwide. Use cases range from determining optimal delivery routes to making informed decisions about property investments, to climate and agricultural technology. For instance, commercial real estate data will make more sense when connected to spatial data about nearby objects (e.g., building, POIs), man-made events (e.g, crimes, traffic), as well as natural events such as wildfires and floods. The importance of understanding the ‘where’ cannot be overstated, especially when it influences operational efficiency, customer satisfaction, and strategic planning.

The significance of spatial analytics underscores the pressing need for its efficient management within the enterprise data stack. Incumbent data platforms, often not built to handle the intricacies and scale of spatial analytics, fall short in meeting these demands. Recognizing this gap, we introduce SedonaDB, a novel spatial analytics database platform. Here is a summary of features supported by SedonaDB:

SedonaDB Key Features

Linking Enterprise Data to the Spatial world

SedonaDB seamlessly incorporates spatial analytics in the enterprise data stack to bring data many steps closer to use cases. Using a scalable spatial join technology, SedonaDB can link customer data stored anywhere to tens of terabytes of spatial data such as maps, roads, buildings, natural events, and man-made events in a few minutes. Users can then apply spatial data processing, analytics, and AI tasks using SQL and Python on their data with unparalleled efficiency and adaptability.

To get started with SedonaDB, please visit the Wherobots website.


With its scalable, distributed architecture, SedonaDB is redefining the way businesses handle geometry and raster spatial data processing and analytics in the cloud. SedonaDB achieves that in two main ways:

  1. Separating Compute/Storage: SedonaDB uniquely separates the spatial processing and analytics layer from the data storage layer. This approach allows for optimal performance and scalability.
  2. Distributed System Architecture: By employing a distributed system architecture, SedonaDB ensures scalable out-of-core spatial computation, catering to massive datasets without compromising speed or accuracy.


SedonaDB builds upon and amplifies the capabilities seen in the open-source Apache Sedona (OSS Sedona). While OSS Sedona provides foundational spatial analytics functions using spatial SQL and Python, SedonaDB takes it to the next level with its faster query processing, lakehouse architecture, and its self-service yet fully-managed provisioning on Wherobots Cloud. This makes SedonaDB a more comprehensive and streamlined solution for businesses. Based on our benchmarks, SedonaDB is up to 10x faster than OSS Sedona for geometry data processing, and up to 20x faster than OSS Sedona for raster data processing.

Spatial Lakehouse Solution

One of the standout features of SedonaDB is its support for an Apache Iceberg-compatible spatial table format, dubbed "Havasu." This feature facilitates efficient querying and updating of geometry and raster columns on Parquet files in cloud object stores such as AWS S3. This enables spatial analytics on the sheer volume of data dumped on cloud object stores this, until today, remains seldom put to use. Details about the Havasu spatial data lake format is avaialble here


SedonaDB is provisioned as a fully-managed service within the Wherobots Cloud, ensuring that users don’t have to delve into the intricacies of managing cloud or compute resources.

By delegating resource management to Wherobots, businesses can concentrate on their core spatial analytics tasks, achieving their objectives faster, efficiently, and cost-effectively.

To use SedonaDB, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.


SedonaDB comes equipped with connectors for major data storage platfoms and databases. This include cloud object stores, data warehouses like Snowflake and Redshift, lakehouses such as Databricks, and OLTP databases including Postgres / PostGIS.

SedonaDB example usage

Using SedonaDB, users can perform a plethora of spatial queries and analytics operations on their data. Here are some common operations users can invoke in SedonaDB. For more details on these examples, please refer to the Wherobots documentation.

Insert geometry data

INSERT INTO wherobots.test_db.test_table
VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3))

Insert external raster data

    .sql("SELECT RS_FromPath('s3a://XXX.tif') as rast")

Create a spatial index

sedona.sql("CREATE SPATIAL INDEX FOR wherobots.db.test_table USING hilbert(geom, 10)")

Read data from PostGIS
    .option("query", "SELECT id, ST_AsBinary(geom) as geom FROM my_table")
    .withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))

Read data from CSV on AWS S3

Read data from a Havasu table

sedona.table("wherobots.test_db.test_table").filter("ST_Contains(city, location) = true")
sedona.sql("SELCT * FROM wherobots.test_db.test_table WHERE ST_Contains(city, location) = true")

Spatial join query to find zones prone to wild fires

fire_zone = sedona.sql(
        z.geometry as zipgeom,
        z.ZCTA5CE10 as zipcode,
        wherobots_open_data.us_census.zipcode z, f
        ST_Intersects(z.geometry, f.geometry)

Visualize geometry data in a notebook


Visualize raster data in a notebook


Performance Benchmark

To better showcase SedonaDB’s performance, we conducted a comprehensive performance benchmark on some commonly seen spatial data processing tasks. Please download the full report of our SedonaDB Performance Benchmark.

To use SedonaDB, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.