General Archives

Wherobots, the Spatial Intelligence Cloud, is Now Available in AWS Europe

Posted on April 14, 2025August 8, 2025 by Tiffany Huynh

The EU is widely recognized as a world leader for climate solutions, automotive design and manufacturing, mobility systems and analysis, environmental monitoring, agriculture and precision farming, and urban development. Geospatial data is foundational to the success of these innovations. However most of the new technology is developed using tools and cloud services not optimized for geospatial development. Compared to internet data, support for geospatial data in the modern cloud environments has lagged. This technology gap has made working with geospatial data expensive, and required staffing data teams with unique expertise.

Introducing Wherobots for the AWS Europe (Ireland) Region

We are excited to announce that Wherobots is ready for EU native workloads. This expansion makes it possible for many EU companies to approve the use of Wherobots and adhere to their data residency requirements by processing and storing data in-region.

Wherobots is the Spatial Intelligence Cloud

Wherobots’ mission is to make it easy for our customers to utilize geospatial data. We are delivering on it via a cloud optimized for developing and running solutions about the physical world, at any scale. Our purpose-built, cloud native approach is enabling teams at AddressCloud and Overture Maps Foundation to accelerate their pace of innovation with geospatial data. Their workloads run up to 20x faster after migrating from popular cloud-based engines, developer productivity is boosted with the most feature complete development experience for SQL and Python, and costs are reduced, putting new solutions in reach.

A Cloud Native Lakehouse Architecture

The architecture of Wherobots is cloud native, and is deeply rooted in open source. Apache Sedona, the open source geospatial engine for Apache Spark, Apache Flink, and Snowflake, is 100% compatible with Wherobots. Users can easily lift and shift their Apache Sedona based applications into Wherobots with zero code changes. Wherobots is also one of the leading companies bringing GEO support into popular open file and open table formats like Parquet and Iceberg, and uses these formats by default. That way, you can deploy various engines on your data, benefit from the advantages of a Lakehouse engine such as ACID transactions and table versioning, without locking your data into proprietary vendor siloes, or inelastic solutions that couple storage with compute.

Getting Started

Getting started is easy. To use Wherobots within the AWS Europe (Ireland) region, get srated with the Professional Edition on the AWS Marketplace. Create a notebook and explore one of many examples designed to help you realize what you can create using SQL and Python. There’s no infrastructure to manage. Teams just use and pay for Wherobots usage on-demand via Wherobots Spatial Units, which reflect the amount of serverless computation consumed.

If there are other clouds or regions that you’re interested in using beyond the ones we currently support, please reach out to us at product@wherobots.com or fill out this form here. You can read more about Wherobots on the website or by exploring our product documentation.

Create a Pro tier account on AWS

Get Started

Spatial Intelligence Newsletter: Location Intelligence w/ Isochrones, Overture Places, Cloud-Native Geospatial, Iceberg and More

Posted on April 10, 2025April 10, 2025 by Tiffany Huynh

Welcome to the April edition of the Spatial Intelligence Newsletter! This month, we’re covering the benefits of using Apache Iceberg, spatial joins, cloud-native geospatial, and new product updates like isochrones to help you make better location-based decisions. What does it all mean, and how can it help you increase data productivity? Check it out here! 👇

⏰💰Hurry, time is running out!

We’re currently offering a FREE $400 credit when you subscribe to the Professional edition of Wherobots, which includes exclusive features like GeoAI with WherobotsAI Raster Inference, map matching for cleaning messy GPS data, new travel isochrones for better location-based decisions, and the ability to bring your own cloud storage, just to name a few.

In addition, with the release of our drive-time isochrones, we’re now offering free access to Overture Places data—enriched with drive-time isochrones across every location in the U.S.—through our Pro tier data catalog. And we will be maintaining that dataset with every release in the future, so you’ll be able to use it going forward for location intelligence. There’s no obligation to get started, so be sure to take advantage of this (it’s like free money). Offer ends on May 31st, so don’t wait!

Latest Content

Benefits of Apache Iceberg for geospatial data analysis

🧊 Apache Iceberg support for GEO data brings a significant modernization for geospatial data and solutions. This support makes it easier for you to bring geospatial data into an open data architecture that decouples compute and storage and lower your costs. By adopting Iceberg in a data lake, you’re enabling your team to leverage the right tool for the job without needing to worry about locking your data into a vendor or a database solution that doesn’t scale.

Additionally, traditional file formats and row-oriented databases struggle when scaling beyond a million features, often performing poorly or only accommodating data that fits comfortably in memory. 😩 Iceberg, built on Parquet, solves this with lightning fast reads, scalability for larger-than-memory datasets, and developer friendly features like DML operations. Plus with added capabilities like versioning and time travel, users can query both current and historical data seamlessly. 🔍 Follow along this post to learn how to use Apache Iceberg with Sedona and find out how these features benefit spatial computations.

Cloud-Native Geospatial: More Than Just Big Data

e💡We had a very insightful discussion with Amy Rose (CTO) from Overture Maps and Eshwaran Venkat (CTO & Co-Founder) from Dotlas on cloud-native geospatial technology. Here are some highlights:

Cloud-native geospatial is not just for big data; it’s more accessible than you might think. You should be able to work with spatial data the way you work with any other data type.
Increasing deliverability and breaking down data silos: Non-spatial communities can now work with spatial data.
Compute systems that make the process more scalable, accessible, elastic, and cost-efficient.
How Dotlas and Overture Maps are optimizing their data pipelines, achieving performance gains, and improving cost efficiency.

Spatial Joins at Scale: Unlocking Advanced Geospatial Analytics with Wherobots

🌎🤝 Spatial joins are essential for geospatial data analysis, but it can be slow or computationally expensive when working with large-scale datasets. Follow along in this tutorial as we walk through how easy and cost-effective it is to:

Join datasets using spatial predicates like ST_Intersects to combine facilities with administrative boundaries and efficiently find the k-nearest neighbor with the ST_AKNN function.
Apply spatial filters and improve performance through strategies like partitioning by geohash
Take your geospatial data analytics to the next level and ensure spatial joins aren’t a bottleneck in solving your business challenges.

Apache Sedona

Sedona Success Story: Optimizing ETL pipelines at scale with Comcast

📊 Some of the challenges that Comcast was trying to overcome was data volume and repeatability. That’s why David Buchanan, GIS Architect, turned to Apache Sedona, which allowed him to reduce processing times from 5 hours to 30 minutes compared to GeoPandas. Watch the recording to learn more.

Apache Sedona Office Hours

😎 We just released Sedona 1.7.1, with some new features :

SQL interface for GeoStats (ST_DBSCAN, ST_GLocal, ST_LocalOutlierFactor)
Broadcast join support for distributed KNN Join
STAC catalog & OpenStreetMap (OSM) PBF reader
New ST functions like ST_RemoveRepeatedPoints

If you missed the office hour, check out the recording to learn more about the latest release. And don’t forget to mark your calendar for the next office hour! 🗓️

Product Updates

Overture Places with Isochrones Dataset: Accelerate accessibility analysis with a ready-to-use dataset containing pre-calculated 5, 10, 15, and 20-minute driving isochrones for millions of US Overture Places (pro+).
New ST Isochrones functions: Make data-driven decisions on logistics, site selection, and market reach using Wherobots’ travel isochrone functions in SQL or Python (pro+).
Audit Logs: Admins gain enhanced security and accountability insights by using Wherobots’ detailed, exportable audit logs to track key Organization actions and system events (pro+).
STAC Reader: Simplify workflows and accelerate queries by loading STAC geospatial datasets directly into Sedona DataFrames in Wherobots (OSS & community+).
Job Run Monitoring: Visually track job execution, analyze resource usage, and manage runs directly within Wherobots for enhanced control and optimization (pro+).
Idle Timeout for Notebooks: Gain control over notebook runtime costs and resource usage with customizable idle timeouts that automatically terminate inactive notebooks (community+).

🆓 Both the Overture Places with isochrones dataset and isochrone functions, as well as the audit logs and job run monitoring, are available exclusively in the Pro tier. Take advantage of the free trial (ending soon!) to try these features and see how they can help solve some of the bottlenecks you might be facing when working with spatial data.

Upcoming Events

Geospatial Tables in the Open Lakehouse: A New Era for Iceberg and Parquet

Wednesday, May 5 at 9AM PT | Virtual

It’s easier than ever to work with geospatial data, with Iceberg and Parquet now offering powerful solutions for both geospatial experts and non-spatial professionals. Join this livestream with leaders from Foursquare, Databricks, Planet, and Wherobots as they discuss the historical challenges of handling spatial data, bridging the gap, and future adoption of these advancements.

Apache Sedona + Iceberg GEO Meetup

Monday, May 12 at 5:00PM PT | San Francisco, California

Join us for a fun and informative evening as we explore Apache Iceberg’s new native geospatial support, designed to solve major challenges in managing geospatial data at scale. This will be a great opportunity to connect with professionals in the field to learn about the latest developments in spatial data, as well as exciting projects people are working on.

🌟 Featured speakers:

Jia Yu, Co-Founder and Chief Architect, Wherobots
Matt Forrest, Director of Customer Engineering and PLG, Wherobots
Yingjun Wu, Founder and CEO, RisingWave Labs

CNG Conference

April 30 – May 2 | Snowbird, Utah

We’re excited to attend the upcoming CNG Conference! Be sure to check out these sessions:

Day 1

1:15pm-2:45pm | Workshop: Interfacing with Cloud-Native Overture Data and the GERS Ecosystem – Sean Knight

Day 2

9:45am-11:15am | Track 2: Introducing geospatial support in Apache Iceberg – Matthew Powers
11:45am-1:15pm | Extract insights from satellite imagery at scale with WherobotsAI – Damian Wylie
4:30pm-5:00pm | Plenary Panel: Builders Panel – Mo Sarwat

👥 If you’ll be at the conference, we’d love to meet with and chat about how you’re working with geospatial data. Feel free to reach out if you’d like to schedule a time to connect!

The Spatial Intelligence Newsletter: Map Matching, Spatial Joins, ML for EO, Cloud-Native Geospatial and More

Posted on March 14, 2025March 14, 2025 by Tiffany Huynh

👋 Welcome back to the latest edition of the Spatial Intelligence Newsletter! We’ve been busy brewing up some exciting things here at Wherobots, so we have plenty of new updates and content to share!

Latest Content

Don’t Let Messy GPS Slow You Down. The Fastest Way to Clean Up Messy GPS Data – And Save Money

Raw GPS data is messy. 😵‍💫 Noisy signals, lost connections, and inaccuracies make it hard to extract valuable insights. Imagine using your GPS to get to your location, only to find it telling you to drive over water instead of the road (personally, I’ve even had the map tell me to walk on water 🌊🚶🏻‍♀️).

Wherobots’ map matching corrects trajectories by aligning them with real-world road networks (❌no more walking on water! ), all while delivering unmatched accuracy and performance (and saving money!).

Apache Iceberg and Parquet now support GEO– A Huge Step Forward for Cloud Native Geo

Geospatial data has always been thought of as a second class citizen because of what modernized the data ecosystem of today, leaving geospatial data mostly behind. But that’s no longer the case. Thanks to the efforts of the Apache Iceberg and Parquet communities, both Iceberg and Parquet now support geometry and geography (collectively the GEO) data types! 🎉

What does this mean? With native geospatial data type support in Apache Iceberg and Parquet, you can seamlessly run query and processing engines like Wherobots, DuckDB, Apache Sedona, Apache Spark, Databricks, Snowflake, and BigQuery on your data. All the while benefitting from faster queries and lower storage costs from Parquet formatted data. 💨

Exploring design and key features to enhance spatial data workloads with Iceberg GEO

With Apache Icerberg and Parquet now supporting GEO types, this helps improve the economics of utilizing geospatial data in end solutions.This advancement allows organizations to create higher-value, lower-cost products and achieve faster results over time.

Let’s take a closer look at these GEO data types in Iceberg, exploring their design, key features, and implementation considerations. Learn how leveraging these features with Apache Sedona and Wherobots can enhance cost performance and data governance, ensuring the best possible experience for spatial data workloads. 📈

Optimizing Earth Observation Models for Production with ML Model Extension

What are the challenges of applying AI to geospatial problems? 🤖Join panel speakers from Wherobots, Radiant Earth, CRIM and Terradue as they discuss how this challenge led to the development of an open, portable solution for describing computer vision models trained on overhead imagery.

Learn about the MLM STAC Extension, its use cases, and why model developers should adopt it, along with Raster Inference– a serverless computer vision solution that extracts valuable insights from aerial imagery. 🌎

Getting Started With Wherobots

Interested in getting started with Wherobots, but unsure of where to begin? Here are some helpful resources. 👇

Wherobots 101: Mastering Scalable Geospatial Data Processing

Want to take your geospatial analytics to the next level? Whether you’re just starting out or already working with spatial data, learn how to leverage valuable tools and workflows in Wherobots Cloud to analyze, visualize and interpret geospatial datasets. From setting up your account to mastering advanced analytics, this session is a helpful guide to set you up for success!

Wherobots 102: Reading and Processing Cloud Native Geospatial Data

Learn how to efficiently load, manage and analyze raster and vector data in Wherobots’ hosted environment. Whether you’re working with massive geospatial datasets or looking for optimized workflows to write and query GeoParquet and Cloud-Optimized GeoTIFFs (COGs), this video will equip you with the tools and techniques to scale your geospatial analysis.

Working with Foursquare Places Data

Which neighborhood in San Francisco has the most coffee shops? Dive into the Foursquare Open Places dataset, a free and open dataset providing 100M+ global places of interest, with our latest tutorial. ☕

You’ll be able to query using Spatial SQL, subset the data for a specific region, search for specific businesses or places, and aggregate locations by geography. By the end of this tutorial, you’ll have a choropleth map showing the number of coffee shops, sorted by neighborhood.

Apache Sedona Community

Sedona Success Story: Optimizing ETL pipelines at scale with Comcast

🚀 Is scaling your ETL pipeline a priority? Discover how Comcast successfully achieved this by using Apache Sedona, all while boosting productivity and improving the quality of their network operations. 🌐

Learn how Apache Sedona reduces vendor lock-in.
Understand why it outperforms tools like GeoPandas and PostGIS.
See how it improves the ability of the Xfinity network team to optimize their network operations through a global view of performance quality and degradation.
Find out how it integrates seamlessly with Apache Spark and other distributed engines.

O’Reilly: Cloud Native Geospatial Analytics with Apache Sedona – Navigating Large-Scale Spatial Data

We know that handling large-scale spatial data can be daunting, which is why we’ve designed this guide to simplify geospatial data. This will help boost your spatial analytics expertise and transform the way you work with geospatial data! 💪

Our newest chapter, focusing on vector data analysis using spatial SQL, is now available. If you’ve already accessed the previous chapters, be sure to check your inbox (on a separate email) for the latest one! 📧

Engage with the Community Through Sedona Office Hours

We host monthly office hours to bring you the latest news and updates to Apache Sedona. Mark your calendars for the next one. Even if you can’t make it, we’ll send you the recording and slides to make sure you don’t miss anything that might be helpful to you. 🤝

Upcoming Events

Spatial Joins at Scale: Unlocking Advanced Geospatial Analytics

If you’ve ever struggled with Spatial Joins (you know who you are), then this is the one to join (pun intended, courtesy of Matt Forrest 😎)! Learn how to seamlessly integrate Python and Wherobots to perform advanced spatial joins and analyses on geospatial data.

Gain practical skills and best practices for processing and visualizing spatial data at scale. Don’t miss this opportunity to boost your spatial analytics expertise and transform how you work with geospatial data.

Fireside Chat with Overture Maps and Dotlas on Cloud-Native Geospatial: More Than Just Big Data

How is cloud-native geospatial reshaping the way organizations interact with spatial data? ☁️🌎 It prioritizes flexibility, changes how data consumers connect, removes friction, and unlocks new possibilities.

Join us, alongside Amy Rose from the Overture Maps Foundation and Eshwaran Venka from Dotlas, as we explore how modern approaches enable scalability across various compute infrastructures, eliminate the need to move massive datasets, and allow users to work with data wherever they are—whether locally or in the cloud. Hear about where geospatial technology is headed. This is a conversation you definitely don’t want to miss!

Getting Started

🆓 Getting started with Wherobots is easy. If you haven’t already, create a free account and dive in. If you’re looking to take your geospatial analytics to the next level—whether it’s full access to open datasets, map matching, or raster inference—try the Pro tier for free.

Get started with Wherobots

Try Now

Apache Iceberg and Parquet now support GEO

Posted on February 11, 2025May 29, 2025 by Ben Pruden

Geospatial data isn’t special anymore, and that’s a good thing.

Geospatial solutions were thought of as “special”, because what modernized the data ecosystem of today, left geospatial data mostly behind. This changes today.

Thanks to the efforts of the Apache Iceberg and Parquet communities, we are excited to share that both Iceberg and Parquet now support geometry and geography (collectively the GEO) data types.

Geospatial challenges

Geospatial data has been disconnected from the broader data ecosystem that modernized from open file formats like Apache Parquet, and open table formats like Apache Iceberg, Delta Lake, and Apache Hudi.

The benefits of these cloud-native open file and table formats fueled widespread adoption of data lake and lakehouse architectures. Organizations moved away from the use of expensive proprietary systems, away from data siloes that coupled compute with storage and didn’t scale, and away from formats that locked them in and stifled innovation. Relative to legacy options, these cloud-native formats fundamentally change how data is stored, managed, and accessed. This in turn lowers costs, increases agency, and unlocks innovation over time. But because geospatial data was different, which led to a number of technical challenges, it wasn’t supported by these formats from the start. As a result developers building solutions with geospatial data struggled with fragmented formats, proprietary file types, and data siloes – making solutions harder and costlier to build.

The silos will break down

With native geospatial data type support in Apache Iceberg and Parquet, you can seamlessly run query and processing engines like Wherobots, DuckDB, Apache Sedona, Apache Spark, Databricks, Snowflake, and BigQuery on your data. All the while benefitting from faster queries and lower storage costs from Parquet formatted data.

These changes improve short and long term economics for geospatial solutions. Organizations will have a new freedom to innovate with a lower cost, highly interoperable architecture. They get to choose the best tool for the job over time without having to shuttle data between systems. Their costs reduce, productivity improves, innovation accelerates, and the playing field is leveled with respect to who can provide the best solution for their data. The legacy siloes will break down, just like they’ve done for non-geospatial data. And most importantly, these changes will lead to new innovation about our physical world.

Benefits of Iceberg and Parquet

These changes make geospatial solutions based on a data lake a lot more attractive. Here are a few benefits.

Iceberg and Parquet alone don’t separate compute from storage, but together they make it possible to utilize low cost data lake storage, along with multiple independent high performance computing solutions for different use cases
ACID transactions and data versioning enable the use of multiple compute engines without conflicts
Time travel allows tracking of data changes over time
Query performance is higher from features like column pruning, row-group filtering, and fast file access
Open data formats minimize vendor lock-in
Geospatial data will be supported across a broader ecosystem of tools and services
And many more…

In the coming weeks, we will be covering these features in detail and demonstrate how they’re beneficial for geospatial solutions.

Grassroots efforts made this happen

These changes were the result of grassroots initiatives, investment, and influence from community members at Planet, CARTO, Wherobots, and many others across the Cloud Native Geospatial community. This includes GeoParquet, which was a grassroots project and an extension of Parquet that proved its worth through use and popularity, countless meetups, and discussions. And we also want to give credit to the Iceberg community for working with members of the Wherobots team, to bring a solution forward while also influencing the Parquet community to make a GEO native data type.

While Iceberg and Parquet communities led with support for GEO data types, we welcome compatibility and support for GEO data types in all cloud-native formats, including Apache Hudi and Delta Lake.

Thoughts from Szehon Ho, Apache Iceberg PMC Member
“The long-awaited incorporation of geospatial data types in the Iceberg V3 spec extends a core theme of Iceberg as a project to provide a universal ‘shared warehouse storage’ across many engines and users, and will now allow this huge, growing ecosystem to work on the same geospatial data as well, unlocking many exciting use cases. It is also a demonstration of Iceberg community’s willingness to take the time and ‘do hard things’, engaging in months of very active discussions across companies and OSS communities, finally reaching consensus on a spec that supports the largest variety of use cases in the fast-evolving geospatial data domain.”

Thoughts from Chris Holmes, co-creator of GeoParquet
“The community developed and rallied behind GeoParquet to make geospatial data in Parquet fully interoperable and to let the geospatial world tap into all the advantages the big data world has been getting from Parquet. I’m very excited to see Parquet and Iceberg formally support geospatial types, and look forward to the acceleration in geospatial innovation that these changes will activate across industries and for our planet.”

Looking ahead

Committers are already working to bring support for these changes into Apache Sedona, and will notify the community as they are introduced.

At Wherobots, we’ve supported these GEO data types in Havasu (our Iceberg fork) which we built to enable geospatial lakehouse architectures with Wherobots, along with GeoParquet. We’ve begun developing native support for Iceberg and Parquet into how Wherobots operates on customer data, and will put our full support behind these native formats moving forward.

To learn more about the reasoning behind the Iceberg GEO types design, the trade-offs we navigated, and what it all means for implementers, please read our follow-up blog: Iceberg GEO: Technical Insights and Implementation Strategies. If you need support throughout your journey adopting and utilizing these cloud-native formats for geospatial use, reach out to Apache Iceberg on Slack or Apache Sedona on Discord.

Watch this livestream from Wednesday, May 7 with leaders from Foursquare, Databricks, Planet, and Wherobots as they discuss the historical challenges of handling spatial data, bridging the gap, and future adoption of these advancements.

Sign up for our newsletter to stay up to date with everything we are doing to enable the spatial community to embrace the modern geospatial lake-house.

Wherobots 2024 accomplishments, and what’s on-deck in 2025

Posted on January 23, 2025February 28, 2025 by Damian

Introduction

2024 was a transformative year for Wherobots. Our mission to revolutionize how geospatial data is used took significant strides forward, positively impacting our customers and industry. Over the past year, we more than tripled the size of our team and successfully closed a $21.5M Series A funding round. We expanded accessibility to Wherobots’ industry-leading geospatial query performance, integrated Wherobots into the native AWS buying experience, and unveiled groundbreaking features like Raster Inference, Map Matching, and GeoStats—empowering users to create scalable geospatial solutions like never before.

Our Mission

Before founding Wherobots, co-founders Mo and Jia identified critical challenges limiting the potential of geospatial data. These stemmed from how geospatial data was traditionally stored, formatted, and processed, and made this data incredibly painful to utilize, particularly at scale.

Over the recent decades, data and analytics investment was mostly directed towards solutions for internet data. However compared to internet data, geospatial data is a lot more complex, which makes it harder to query. It’s polygons representing land and buildings, GPS trajectories, satellite and drone imagery, weather data, and more—all tied to Earth’s imperfect spherical surface. And querying this data generally means you need to filter and join it with other datasets (geo or non-geo). Due to this complexity, existing cloud analytics engines built for structured internet data struggle to efficiently run spatial queries at scale. They also miss features necessary to prepare this data, they lack features that make solution development productive, and simply cannot compute spatial results with high precision. As a result, solutions based on geospatial data are expensive, or otherwise shelved.

We are addressing these challenges. By reducing the cost and effort to build with geospatial data, Wherobots will enable a new wave of innovation for the physical world. This will drive breakthroughs in products, business operations, science, government, and make a positive impact on our climate.

Our mission is simple yet ambitious: make geospatial data easy to use.

Here’s what some of our customers have to say about how we’re helping them achieve their missions.

Customer Highlights

AddressCloud

Enabling insurers to calculate geographic risk with precision

“Wherobots runs our compute operations that used to take hours or days to complete, in minutes. As we provide perils information (flood, fire, etc) to insurers at the property level, we particularly appreciate the ability to be able to run combined vector/raster analysis, without having to previously transform the raster data into vector format or some other format.”
– John Powell, Senior Geospatial Data Engineer at AddressCloud

Overture Maps Foundation

Creating next-generation map products with scalable, open map data

“Overture produces a building dataset covering all buildings in the world, with 2.3B geometries and growing, that’s updated frequently. There’s a lot of data and compute that goes into producing and keeping it up to date,” said Jennings Anderson, Geoscientist at Overture and Data Engineer at Meta. “We accelerated the pipelines that produce the buildings dataset by up to 20x after we moved them to Wherobots, which required a simple redirection of our code. We retained compatibility with Apache Sedona, and the move put us into a development experience that’s made us more productive.”
– Jennings Anderson, Geoscientist at Overture and Data Engineer at Meta

Why Wherobots Stands Out

Several recurring themes highlight why customers choose Wherobots:

Unmatched performance and cost efficiency: Wherobots delivers up to 20x better spatial join performance compared to modern cloud data engines, at a fraction of the cost.
Ease of innovation: Wherobots makes it easy to build solutions with raster (e.g., satellite imagery), vector (e.g., geometry, geography) data, and your first party data regardless of scale.
Modern cloud architecture: Wherobots is fully compatible with Apache Sedona, and runs seamlessly on data lakes with support for Apache Iceberg and Apache Parquet.

2024 Milestones

Funding & Market Validation

In 2024, we raised $21.5M in a Series A round led by Felicis, with support from Wing Venture Capital, Clear Ventures, JetBlue Ventures, and P7 Ventures. This funding reflects confidence in our mission and the massive market opportunity for geospatial solutions in the cloud.

Team Growth

The Wherobots team—the “Botsters”—tripled in size this year. While engineering saw the most growth, we also built out go-to-market, marketing, and product teams and are actively scaling our sales team. As we head into 2025, we’re actively hiring for roles across the company to support our expanding vision.

Product Innovations

We launched several key features in 2024 that expanded the boundaries of geospatial data solutions. *The features noted with an are only available in the professional or enterprise edition of Wherobots.****

Cloud Native

A pay-as-you-go offering on the AWS Marketplace makes it easy to subscribe and pay on-demand using AWS Marketplace billing.
A storage integration for Amazon S3, to quickly and securely integrate with first or third party data.

Security and Access

SAML Single Sign-On, makes logging in simple, secure, and seamless for users in companies with centralized login management systems.
The Spatial SQL API, Typescript and Python SDKs, and a JDBC driver make it possible to query WherobotsDB using popular or custom query interfaces.
Continuous improvement of internal security and service availability.

Open Data Architecture

The first version of the Spatial Catalog (known as Havasu, with core functionality soon to be merged into Apache Iceberg).

Accelerating Geospatial Solution Development

Raster Inference, to easily extract insights from satellite imagery at scale using SQL. (We’re hosting an upcoming panel discussion with an incredible lineup of speakers to discuss the MLM STAC Extension and Raster Inference, with a focus on optimizing Earth observation models for production. Learn more and save the date here.)
Distributed Map Matching is a purpose built algorithm for snapping GPS trajectories to known segments like roads, with high performance at-scale.
GeoStats: a geostatistics suite designed for scale, performance, and streamlining solution development.
Support for K-nearest neighbor joins (exact and approximate) to efficiently query for geospatial neighbors at-scale.
Vtiles, a vector tile solution purpose built for creating vector tiles at scale with high performance.
Many new vector (ST) and raster (RS) functions to accelerate the developer productivity.
Continuous improvement of spatial and non-spatial query performance to reduce cost and make workloads more compute efficient (reducing climate impact).

Automation

Job Runs integrated with Apache Airflow, to make it easy and familiar to automate new and existing processing workflows.
Service Principals, enable authentication and automated usage of Wherobots, decoupled from the tenure or privileges of human users.

Looking Ahead

In 2025, we plan to bring Wherobots Cloud to the EU market with support for the AWS Europe (Ireland) region, and achieve the SOC 2 Type 2 certification (currently in progress). We’ll continue to focus on:

Making Earth observation data easier to utilize.
Enhancing developer productivity and experiences.
Improving query engine performance and data compatibility.
Strengthening service availability and support for customers.
Delivering new administrative controls and observability.

Ready to Build?

We are currently offering a 30-day free trial covering up to $400 in usage via the AWS Marketplace. Getting started is easy. There are many example notebooks for various geospatial use cases that you can explore and run without any coding experience required. Not only do the notebooks help you get started, but we also see most of our customers use these notebooks as references for the solutions they end up building.

Join the Mission

Motivated by our mission? Join our growing team—visit our careers page for open roles. You can also share feedback at feedback@wherobots.com or contact me directly at damian@wherobots.com.

Try Wherobots Pro

Get Started

WherobotsAI Raster Inference is GA with Support for Bring Your Own Model

Posted on December 17, 2024March 31, 2026 by Ben Pruden

Introduction

UPDATE: Raster inference is included in Wherobots RasterFlow. See Wherobots Get Started with RasterFlow – Wherobots for the most up-to-date workflows.

We are excited to announce that WherobotsAI Raster Inference is now generally available! Raster Inference is a serverless, planetary-scale computer vision solution that enables data teams to extract meaningful insights from aerial imagery (raster data) sources, such as satellites or drones, and puts these insights at the fingertips of data scientists and developers.

During the preview period, customers analyzed raster data by running inference with a limited set of Wherobots-hosted open-source computer vision models. Now, you can bring your own model into Raster Inference, offload inference pipeline management effort, and apply this capability to a much broader set of use cases. We have also made significant enhancements to our compute service to accelerate inference performance.

Typical Use Cases

Data teams use WherobotsAI Raster Inference to identify information in complex, large-scale overhead imagery data. Some common use cases include:

Agriculture: Satellite and drone imagery are critical for monitoring land use, predicting crop yields, and improving sustainable farming practices.
Environment and Conservation: Raster imagery is essential for monitoring ecosystem and biodiversity changes, such as tracking glacier melting, sea level rise, temperature fluctuations, and environmental degradation (e.g., oil spills, deforestation).
Energy: Renewable energy developers analyze satellite data to assess land suitability, solar radiation, and wind patterns, to determine optimal locations for renewable energy projects.
Insurance: Insurers use raster data to calculate risk assessments based on environmental and local factors to conduct and improve damage assessments at scale.
Map Creation & Maintinence: Global digital, high fidelity map producers and maintainers extract features such as buildings, road networks, landcover, shipping lanes, etc., and their respective changes in order to maintain truthful connection between global map data products and the real world.

Traditional challenges with computer vision pipelines

We’ve met with many businesses struggling to get critical insights from raster data. Some are manually sifting through imagery. This method doesn’t scale, it’s expensive, error prone, and time intensive. Others are utilizing complex computer vision solutions that are not designed for overhead raster imagery. These computer visions solutions:

Take time to build, and require significant management to efficiently load, store, and process large raster datasets.
Are difficult to scale to accommodate increasing workload sizes.
Are fragile with multiple components involved and challenges maintaining compatibility with an evolving modeling stack.
Require effort to experiment, test, and integrate new models and inference runs.

Benefits of WherobotsAI Raster Inference

With WherobotsAI Raster Inference, you can:

Use data pipelines that are ready for small to planetary-scale raster data.
Deploy an on-demand solution in seconds that scales to meet workload needs without having to manage infrastructure.
Easily import your model or utilize any model hosted by Wherobots.
Experiment and generate critical insights faster.

In the following sections, we’ll provide a brief overview of Wherobots-hosted models, how to bring your own model, recent performance improvements, and how to use this feature effectively.

Choosing a Model for Raster Inference

Wherobots-Hosted Models

Wherobots-hosted models are precompiled and optimized for raster inference, enabling them to scale effortlessly and execute inference on large datasets. Below is a brief overview of the initial set of Wherobots-hosted models. We plan to host additional models based on customer feedback. For more information on these models and their performance metrics, see our docs page here.

Below is a code snippet on how to use landcover-eurosat-sentinel2 in WherobotsAI Raster Inference:

# set the model name

model_id = "landcover-eurosat-sentinel2"

# call the model in raster inference
df_predictions = df_raster_input.withColumn("preds", rs_classify(model_id, "outdb_raster"))

The STAC Machine Learning Model Extension Specification

The model import function of Raster Inference is built on the STAC Machine Learning Model (MLM) Extension Specification, an open community standard for model sharing we co-developed with CRIM and other collaborators. The MLM specification enables model portability, making it easier to use models across teams and compute platforms.

Before the MLM specification, data scientists and modelers sharing geospatial computer vision models often had to adapt existing standards, such as HuggingFace model cards, to store relevant information. This involved repurposing the cards to specify details like required raster bands, necessary data preprocessing, and post-processing functions. Without a community standard for organizing this information, these model cards were often inconsistent and documented in varying formats, making model sharing and reproducibility across different compute platforms cumbersome.

The STAC MLM extension introduces a community standard designed to simplify storing and sharing geospatial computer vision models. It achieves this by providing a comprehensive schema to:

Describe critical geospatial model attributes, such as geolocation and temporal range.
Include key model inference reproducibility details, such as required bands, model artifact locations, and pre- and post- processing steps.
Enable model collections to be searched alongside associated spatiotemporal datasets.

The MLM specification has already been adopted in key modeling efforts at Terradue and is proudly supported by Radiant Earth. We’ll share more in a series of blog posts and a panel discussion with our collaborators on the MLM in late January — register here on the interest form to receive an invite when the date is finalized.

Bring Your own Model

The MLM specification enables users to quickly and easily use many geospatial, deep learning-based computer vision models with Raster Inference. We leverage a model’s MLM specification to specify and integrate the required data preprocessing, model, and post processing into the larger raster inference pipeline.

To bring your own model to Raster Inference:

Fill out our MLM form for your model. This will create a MLM formatted JSON file (MLM JSON) describing your model. Download the file.
Upload the MLM JSON file from Step 1 to your AWS S3 bucket.
Copy and save the AWS S3 URI to your MLM JSON.
During runtime, use your model for inference by calling the raster inference function with your MLM’s S3 URI link.

You can find a full walkthrough in our documentation on using the MLM form to bring your own model.

Performance Improvements

In addition to enabling bring your own model, we’ve accelerated asynchronous data loading in the inference engine to boost performance. Below, you can see how performance has evolved with experiments conducted using WherobotsAI Raster Inference and a Tiny GPU Runtime.

Example: Raster Inference with Bring Your own Model

For a full tutorial example on how to bring your own model to segment solar farms in Sentinel-2 imagery, see our documentation here.

Example: Run Raster Inference with a Wherobots-hosted model

We’ll walk through an example on how to identify solar infrastructure in raster imagery using a Wherobots-hosted model. In our example, we will be using the model:

solar-satlas-sentinel2

The full Python notebook file can be found on our GitHub.

Set up the WherobotsDB context

import warnings
warnings.filterwarnings('ignore')

from wherobots.inference.data.io import read_raster_table
from sedona.spark import SedonaContext
from pyspark.sql.functions import expr

from wherobots.inference.engine.register import create_semantic_segmentation_udfs
from pyspark.sql.functions import col

config = SedonaContext.builder().appName('segmentation-batch-inference')
    .getOrCreate()

sedona = SedonaContext.create(config)

Load Satellite Imagery

tif_folder_path = "s3a://wherobots-benchmark-prod/data/ml/satlas"
files_df = read_raster_table(tif_folder_path, sedona, limit=400)
df_raster_input = files_df.withColumn(
        "outdb_raster", expr("RS_FromPath(path)")
    )

df_raster_input.cache().count()
df_raster_input.show(truncate=False)
df_raster_input.createOrReplaceTempView("df_raster_input")

Run WherobotsAI Raster Inference

Specify a Wherobots-hosted model to run inference

model_id = "solar-satlas-sentinel2"

You can run WherobotsAI Raster Inference using either the Wherobot’s SQL API or Python API.

Using the SQL API

predictions_df = sedona.sql("""
SELECT
  outdb_raster,
  segment_result.*
FROM (
  SELECT
    outdb_raster,
    RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
  FROM
    df_raster_input
) AS segment_fields
""")

predictions_df.cache().count()
predictions_df.show()
predictions_df.createOrReplaceTempView("predictions")

Using the Python API

rs_segment =  create_semantic_segmentation_udfs(batch_size = 10, sedona=sedona)
df = df_raster_input.withColumn("segment_result", rs_segment(model_id, col("outdb_raster"))).select(
                               "outdb_raster",
                               col("segment_result.confidence_array").alias("confidence_array"),
                               col("segment_result.class_map").alias("class_map")
                           )
df.show(3)

Extract predicted geometries (continued from step 4 using the SQL API)

df_multipolys = sedona.sql("""
    WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(outdb_raster, confidence_array, array(1), class_map, 0.65) result
        FROM predictions
    )
    SELECT result.* FROM t
""")

df_multipolys.cache().count()
df_multipolys.show()
df_multipolys.createOrReplaceTempView("multipolygon_predictions")

df_merged_predictions = sedona.sql("""
    SELECT
        element_at(class_name, 1) AS class_name,
        cast(element_at(average_pixel_confidence_score, 1) AS double) AS average_pixel_confidence_score,
        ST_Collect(geometry) AS merged_geom
    FROM
        multipolygon_predictions
""")
df_filtered_predictions = df_merged_predictions.filter("ST_IsEmpty(merged_geom) = False")
df_filtered_predictions.cache().count()
df_filtered_predictions.show()

Visualize results

from sedona.maps.SedonaKepler import SedonaKepler
config = {
    'version': 'v1',
    'config': {
        'mapStyle': {
            'styleType': 'dark',
            'topLayerGroups': {},
            'visibleLayerGroups': {},
            'mapStyles': {}
        },
    }
}
map = SedonaKepler.create_map(config=config)

SedonaKepler.add_df(map, df=df_filtered_predictions, name="Solar Farm Detections")
map

Get started with WherobotsAI Raster Inference

Professional Edition Users

If you’re a Wherobots Professional Edition or Enterprise user, you have access to all capabilities of WherobotsAI Raster Inference! If you have access to GPU runtimes, sign in to your account now to launch a Wherobots Notebook and explore the feature. If you don’t yet have access, request it today and start using Raster Inference as soon as tomorrow. Explore how easy it is to bring your own model using our guided example on GitHub or within a Wherobots notebook instance. Start integrating WherobotsAI Raster Inference into your workflow today!

Community Edition Users

Although Raster Inference is not available in Wherobots Community Edition, we are currently offering a free trial for the Wherobots Professional Edition. You can either sign up through AWS Marketplace or upgrade your account to get started for free and integrate Raster Inference into your workflow today.

What’s next

We’re eager to hear about the models you’d like us to support and any features you’d like to see added. For product feedback, feel free to email us at feedback@wherobots.com (no request is too small). To see how others are using Raster Inference, ask questions, and share your own experiences, join the Wherobots community. We look forward to your ideas and creations!

Missed our panel discussion with our collaborators CRIM, Terradue and Radiant Earth on the MLM STAC extension? Watch the recording below.

Get Started with Wherobots

Try Now

Wherobots is ready for AWS workloads

Posted on November 26, 2024July 1, 2025 by Ben Pruden

Planetary-scale geospatial solutions are now accessible via Wherobots on the AWS Marketplace

We’re thrilled to announce that Wherobots is generally available for AWS customers with pay-as-you-go pricing via the AWS Marketplace. AWS customers can subscribe to a 30 day, free trial of the Wherobots Professional Edition for up to $400 in usage, and discover how easy it is to create spatial solutions that propel their business forward. The integration with the AWS Marketplace simplifies the Wherbots buying and usage experience, particularly those with AWS commitments or discounts that apply to AWS Marketplace spend. Coupled with a secure integration to run Wherobots on private or public S3 buckets, Wherobots is where the next generation of geospatial solutions are developed on AWS.

The potential of spatial data is very high

Many companies have significant investments in assets, products, or services that are influenced by our dynamic world. To be competitive, adaptive, and profitable, companies need to accelerate the velocity of decisions about these investments.

Insurance companies like State Farm need solutions for calculating asset risk in the face of a rapidly changing climate.
Telecommunications providers like Comcast optimize their network operations to account for bandwidth constraints from physical barriers like buildings, tunnels, and weather.
Retailers like Starbucks need to identify the next retail location to launch or sunset based on mobility data, supply and demand, and demographic trends.
Logistics providers like Amazon Last Mile Delivery need to autonomously adjust distribution and delivery plans based on roadway conditions, road updates, traffic conditions, and delivery payloads.
Farming and forestry operations are looking for ways to optimize yields in a way that’s sustainable and profitable.
Solar and wind farm operators are identifying the next best locations to develop sustainable power sources, and also connect to expanding charging station networks across the globe.

These industries are modernizing with the cloud, but services in the cloud haven’t made it easy to create these types of solutions, until now.

Spatial ideas are everywhere, but solutions are sparse

If these types of solutions resonate with your business, I’d wager that if you asked your teams to produce ideas that rely on geospatial data, they’d have a lot to share. But the reality is most teams are not enabled to unlock these ideas. They will say it’s really hard, if not infeasible to turn these ideas into solutions that propel your business forward. Which also means the ideas are put on the back burner or considered far-fetched.

Wherobots makes spatial solutions accessible

Built on and fully compatible with Apache Sedona, Wherobots is the Spatial Intelligence Cloud. Wherobots offers geospatial ETL, analytics, and AI solutions that make it easy for data scientists and engineers to create spatial data products, and the intelligence that drives their business forward. Wherobots delivers industry leading scalability and spatial computing performance on your data lake. It’s up to 20x more performant than Apache Sedona and Apache Spark using a serverless analytics and inference engine optimized for spatial operations. Development is unified across what are otherwise siloed data types – raster (satellite and drone imagery) and vector (mobility data, polygons, trips, roads) data. And solutions can be built with SQL, Python, or Scala in a notebook, putting solutions in-reach to common developers. There’s 300+ built in functions and higher level features, like Map Matching, Geostats, and Raster Inference to accelerate development. With pay-as-you-go pricing on the AWS Marketplace, Wherobots is where the next generation of spatial data solutions are built.

What are customers saying?

Wherobots customers like AddressCloud and Overture realized typical performance gains of 5-20x, lower costs, and objectively higher developer productivity after migrating their Apache Sedona workloads into Wherobots.

AddressCloud helps insurers calculate geographic risk

“Wherobots has given us the potential to run jobs that used to take hours or days to minutes and removed the need to think about provisioning compute. As we provide perils information (flood, fire, etc) to insurers at the property level, we particularly appreciate the ability to be able to run combined vector/raster analysis, without having to previously transform the raster data into vector format or some other format,” said John Powell, Senior Geospatial Data Engineer at Addresscloud.”

Overture provides current and next-generation map products by creating reliable, easy-to-use, and interoperable open map data

“Overture produces a building dataset covering all buildings in the world, with 2.3B geometries and growing, that’s updated frequently. There’s a lot of data and compute that goes into producing and keeping it up to date,” said Jennings Anderson, Geoscientist at Overture and Data Engineer at Meta. “We accelerated the pipelines that produce the buildings dataset by up to 20x after we moved them to Wherobots, which required a simple redirection of our code. We retained compatibility with Apache Sedona, and the move put us into a development experience that’s made us more productive.”

Getting started is easy.

From your AWS account, subscribe to the Professional Edition of Wherobots on the AWS marketplace. You can get started risk-free in the professional edition with features required by production workloads. We’ve built tens of example notebooks to help you go from zero to iterating with spatial data in minutes.

Feedback?

Our mission is to make it easy for our customers to utilize geospatial data. We cannot complete our mission without your input, and are working with a variety of customers to shape what we do next. If you are invested in the problems we are solving, and have an idea to improve our product, please contact us at feedback@wherobots.com, or contact me directly at damian@wherobots.com. I’m eager to hear from you, and we are there to help you innovate with Wherobots on AWS.

Try Wherobots on AWS Marketplace

Get Started

Announcing Our 21.5M Series A :: Unlocking Answers to Planetary-scale Questions.

Posted on November 26, 2024April 8, 2025 by Ben Pruden

Unlocking answers to planetary-scale questions.

By Wherobots co-founders Mo Sarwat and Jia Yu

Each day, satellites, drones, applications, and GPS devices generate petabytes of spatial data that can be used to solve real-world problems. But the majority of this data is stuck in siloed legacy systems or sits idle and disjointed. We see the potential this data can have for business, the planet, government, and societies. And we’re on a mission to help companies fully utilize it so they can tackle issues like how to manage their fleets of vessels and vehicles, where and how to build infrastructure, and determine the best methods to assess and mitigate risk of catastrophic natural disasters.

To achieve our mission, we’re partnering with leading investors in the technology space and have raised $21.5M in Series A funding—led by Felicis, with continued support from Wing Venture Capital and Clear Ventures and participation from JetBlue Ventures and P7 Ventures. Aydin Senkut, Founder and Managing Partner at Felicis will also be joining Wherobots’ Board of Directors together with Peter Wagner from Wing Venture Capital. We are committed to constantly improving our technology to process and analyze geospatial data faster and more efficiently and this funding will accelerate our product development and go-to-market operations.

From Research to Market

Growing up in Egypt, Mo saw the impact of climate change first-hand. Rising temperatures and pollution are threatening the air quality and water supply of millions of Egyptians. These challenges, among others, are not isolated—they reflect global issues as our world changes faster than ever. Motivated by these realities, we set out to harness data that captures what’s happening in the physical world to drive innovation and empower people to tackle both large-scale and localized problems. We met when Mo was a professor and Jia was finishing his PhD at ASU. We bonded over our shared passion for geospatial data and its untapped use cases. We realized the popular data warehousing and analytics solutions available were built from the ground up to process internet data, not geospatial data. When geospatial data is forced into these systems, they underperform, lack essential features for intuitive geospatial analysis, and are often either closed-source or reliant on outdated architectures. These limitations make geospatial solutions inaccessible for most organizations. Recognizing this gap, we set out to create a solution tailored to the unique challenges of geospatial data, unlocking its power for organizations of all sizes.

Apache Sedona — an open-source geospatial compute framework — was our first response to this issue. Today Apache Sedona has over 40M downloads and is now used to run planetary-scale workloads by companies like Amazon.com for last mile delivery and Land O’ Lakes for precision agriculture. After years of growing Apache Sedona, we saw a tremendous appetite for a more in-depth enterprise solution. Enter Wherobots, a fully managed, scalable cloud platform that is purpose-built to make geospatial solutions easy to create while maintaining compatibility with Apache Sedona. Wherobots also integrates well with the modern data and AI ecosystem, making it a plug-n-play option for Fortune 1000 companies to derive value from the geospatial data they collect.

Putting Data to Work

Wherobots’ Spatial Intelligence Cloud empowers businesses to unlock planetary-scale solutions and put their spatial data to work. Data teams are able to solve problems faster and more efficiently on a compute engine that’s optimized for spatial analytics, a broad set of functions in SQL, Python, and Java, with a variety of native geospatially specific functions, as well as the ability for customers to bring in their own AI and ML models to drive insight from the physical world. This makes Wherobots a far more productive system for data teams to get their work done without switching contexts. Using Wherobots, industries across financial services & insurance, transportation, logistics & supply chain, energy, agriculture, and social services can analyze real-world issues up to 20x faster at a planetary-scale. This means more informed, faster decision making around areas like last mile delivery, infrastructure, mobility, and agriculture.

Our Community and Partners

Industries need to stay ahead of an evolving planet as the climate changes, natural disasters become more prevalent, the rate in which people migrate increases, geopolitical issues become more common, and interconnected systems continue to evolve. These shifts can raise both macro and micro level challenges around everything from where to focus a businesses’ operations and infrastructure to where consumer demand is moving. Wherobots activates the data businesses already have available by making their geospatial context more complete and precise, allowing them to scale and plot an intelligent and adaptable course forward.

We’re bringing this to life working with organizations like The Overture Maps Foundation, a coalition of industry leaders including Meta, Microsoft, Amazon, and TomTom, to support its global mapping initiatives, Addresscloud, to help insurers understand geographic risk, and GeoPostcodes, to support analytics for its global postal and population database.

Here’s what they have to say:

“Overture produces a building dataset covering all buildings in the world, with 2.3B geometries and growing, that’s updated frequently. There’s a lot of data, and compute that goes into producing it and keeping it up to date,” said Jennings Anderson, Geoscientist at Overture and Data Engineer at Meta. “We accelerated the pipelines that produce the buildings dataset by up to 20x after we moved them to Wherobots, which required a simple redirection of our code. We retained compatibility with Apache Sedona, and the move put us into a development experience that’s made us more productive.”

“Our high quality data results from aggregating reliable raster population data with our curated boundaries vector database,” said Jerome Urbain, Head of Products at GeoPostcodes. “With Wherobots Spatial SQL, we’re able to analyze population data more efficiently and more accurately, reducing processing time from 39 days to less than one day, and deliver it to our customers across the globe in a far more timely manner. Not only was this a massive speed increase, the overall impact to our data team is they are able to work far more efficiently and productively, answering questions for our customers faster and helping to grow our business.”

“Wherobots has given us the potential to run jobs that used to take hours or days to minutes and removed the need to think about provisioning compute. As we provide perils information (flood, fire, etc) to insurers at the property level, we particularly appreciate the ability to be able to run combined vector/raster analysis, without having to previously transform the raster data into vector format or some other format,” said John Powell, Senior Geospatial Data Engineer at Addresscloud.

“From a developer perspective, having data, algorithms and compute (and to be presented with a Spark/Sedona context in a Jupyter notebook on startup) combined in one platform is extremely powerful, comparable in many respects to Google Earth Engine, but with much greater guarantees of, and control over, job completion.”

AWS Marketplace Integration

We’re researchers at heart and we understand that there are so many undiscovered use cases for geospatial data—our customers are the ones helping expand and retool the industry. We’re excited to reach even more teams through our availability on the AWS marketplace, allowing customers to leverage their AWS committed spend and benefit from integrated billing. We’ll be at AWS re:Invent in December (next week!) to learn what else geospatial data can take on. If you are coming to re:Invent, sign up for our GeoParty on the 4th of December, or check out our lightning talk at 12:30 on Thursday. Additionally, the Amazon Last Mile team will be showcasing how they utilize Apache Sedona at the Open Source Developer Theater. For a full overview of everything we have going on at re:Invent, checkout our overview page.

What’s Next

When we first started the groundwork for Wherobots, we were shocked at the disconnect between the vast amount of planetary data available and cloud data infrastructure support. Through this new round of funding, we hope to provide companies with the tools to really see the world we live in, adapt to new challenges, create intelligence, and potentially save lives.

We hope you follow along for our next chapter and if this sounds like something you want to be a part of—we’re always looking for great talent.

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the The Spatial Intelligence Newsletter:

What is Apache Sedona?

Posted on October 2, 2024February 20, 2026 by Ben Pruden

Last Updated: February 2026

Apache Sedona is an open-source cluster computing system built for processing large-scale spatial data across distributed environments. Originally developed at Arizona State University under the name GeoSpark, in the paper “Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond” by Jia Yu and Mohamed Sarwat, it is now a top-level Apache Software Foundation project used by organizations across transportation, logistics, environmental monitoring, insurance, and urban planning. This page covers what Apache Sedona is, how it processes spatial queries, common use cases, and how to get started.

Key Takeaways

Apache Sedona is an open-source cluster computing system for processing large-scale spatial data
It extends distributed compute frameworks including Apache Spark, Apache Flink, and Snowflake
Originally created at Arizona State University under the name GeoSpark Supports spatial operations including spatial joins, distance calculations, and spatial aggregations
Ingests data from multiple formats including Shapefiles, GeoJSON, GeoTiff, Parquet, and CSV
Used across industries including transportation, urban planning, environmental monitoring, logistics, and insurance
Has surpassed 38 million downloads as of October 2024

What is Apache Sedona and How Does It Work

Apache Sedona treats spatial data as a first-class citizen by extending distributed compute frameworks including Apache Spark, Apache Flink, and Snowflake with specialized data types, spatial operations, and indexing techniques optimized for spatial workloads. Unlike general-purpose compute frameworks, Sedona is purpose-built for the unique challenges of spatial data, including complex geometries, coordinate systems, and spatial relationships that standard data types cannot handle efficiently. The following section outlines how Apache Sedona processes spatial queries from data ingestion through to distributed execution.

What Programming Languages Does Apache Sedona Support

Apache Sedona supports multiple programming languages, including Python, Scala, Java, R, and SQL, making it accessible to a wide range of data engineering and analytics workflows. Developers can interact with Apache Sedona through whichever language fits their existing stack.

On the integrations side, Apache Sedona runs on Apache Spark, Apache Flink, and Snowflake. Each runtime serves a different need: Apache Spark for large-scale distributed batch processing, Apache Flink for real-time streaming spatial analytics, and Snowflake for teams running spatial workloads inside a cloud data warehouse environment.

How Does Apache Sedona Process Spatial Queries

The first step in spatial query processing is to ingest geospatial data into Apache Sedona. Data can be loaded from various sources such as files (Shapefiles, GeoJSON, Parquet, GeoTiff, CSV, etc) or databases into Apache Sedona’s in-memory distributed spatial data structures (typically the Spatial DataFrame).

Next, Sedona makes use of spatial indexing techniques to accelerate query processing, such as R-trees or Quad trees. The spatial index is used to partition the data into smaller, manageable units, enabling efficient data retrieval during query processing.

Once the data is loaded and indexed spatial queries can be executed using Sedona’s query execution engine. Sedona supports a wide range of spatial operations, such as spatial joins, distance calculations, and spatial aggregations.

Sedona optimizes spatial queries to improve performance. The query optimizer determines an efficient query plan by considering the spatial predicates, available indexes, and the distribution of data across the cluster.

Spatial queries are executed in a distributed manner using Sedona’s computational capabilities. The query execution engine distributes the query workload across the cluster, with each node processing a portion of the data. Intermediate results are combined to produce the final result set. Since spatial objects can be very complex with many coordinates and topology, Sedona implements a custom serializer for efficiently moving spatial data throughout the cluster.

What Are Common Apache Sedona Use Cases

Organizations use Apache Sedona for a range of large-scale geospatial data processing tasks, including

Creating custom weather, climate, and environmental quality assessment reports at national scale by combining vector parcel data with environmental raster data products.
Generating planetary scale GeoParquet files for public dissemination via cloud storage by combining, cleaning, and indexing multiple datasets.
Converting billions of daily point telemetry observations into routes traveled by vehicles.
Enriching parcel level data with demographic and environmental data at the national level to feed into a real estate investment suitability analysis.

Many of these use cases can be described as geospatial ETL operations. ETL (extract, transform, load) is a data integration process that involves retrieving data from various sources, transforming and combining these datasets, then loading the transformed data into a target system or format for reporting or further analysis. Geospatial ETL shares many of the same challenges and requirements of traditional ETL processes with the additional complexities of managing the geospatial component of the data, working with geospatial data sources and formats, spatial data types and transformations, as well as the scalability and performance considerations required for spatial operations such as joins based on spatial relationships.

For a real-world example of Apache Sedona in production, watch how Comcast data engineer David Buchanan used Apache Sedona to optimize geospatial ETL pipelines at scale, reducing processing time from 5 hours to 30 minutes:

“Apache Sedona has been a great asset to our team at Comcast. We adopted Apache Sedona to add geospatial capabilities to our existing Spark extract, transform, and load (ETL) pipelines. Its open-source nature enables us to be flexible in creating pipelines that are usable both in the cloud and on-premises, reducing vendor lock-in. Additionally, we can quickly scale our workloads depending on the team's needs, which has increased our productivity compared to using other tools like GeoPandas or PostGIS.”

David Buchanan

Comcast

How Widely is Apache Sedona Used

Apache Sedona is one of the most widely adopted geospatial analytics libraries in the distributed computing ecosystem, with over 38 million downloads and active use across industries including transportation, logistics, environmental monitoring, and insurance. As a top-level Apache Software Foundation (ASF) project since February 2023, Sedona’s governance, licensing, and community participation align with ASF principles.

Sedona has an active and growing developer community, with contributors from a number of different types of organizations and over 100 individuals interested in advancing the state of geospatial analytics and distributed computing. As of October 2024, Apache Sedona had surpassed 38 million downloads, with approximately 2 million downloads per month and year-over-year usage growth of 200%.

Organizations in industries including transportation, urban planning, environmental monitoring, logistics, insurance and risk analysis and more have adopted Apache Sedona. These organizations leverage Sedona’s capabilities to perform large-scale geospatial analysis, extract insights from geospatial data and build geospatial analytical applications at scale.

Apache Sedona has been featured in conferences, workshops, and research publications related to geospatial analytics, distributed computing, and big data processing. For a deeper look at Apache Sedona’s adoption and real-world impact, watch Iceberg Geo Type: Transforming Geospatial Data Management at Scale presented by Jia Yu and Szehon Ho at at Data + AI Summit.

How to Get Started With Apache Sedona

For quick access to documentation and community support:

Check out the documentation for Apache Sedona.
Join the community Discord server.
Attend the community office hours.
Find and contribute to Apache Sedona on GitHub.

For a comprehensive introduction, ‘Cloud Native Geospatial Analytics with Apache Sedona,’ published with O’Reilly, covers how to work with large-scale spatial data using Apache Sedona, Apache Spark, and modern cloud technologies. It is aimed at developers, data scientists, and data engineers

Get Access to the Free Guide

Download

Wherobots Joins Overture, Winning The Taco Wars, Spatial SQL API, Geospatial Index Podcast – This Month In Wherobots

Posted on August 1, 2024January 29, 2026 by Ben Pruden

Welcome to This Month In Wherobots the monthly developer newsletter for the Wherobots & Apache Sedona community! This month we have news about Wherobots and the Overture Maps Foundation, a deep dive on new Wherobots Cloud features like raster inference, generating vector tiles, and the Spatial SQL API, plus a look at retail cannibalization analysis for the commercial real estate industry.

Wherobots Joins Overture Maps Foundation

Wherobots has officially joined Overture Maps Foundation to support the next generation of planetary-scale open map data. Wherobots has supported the development of Overture datasets through Overture Maps Foundation’s use of the open-source Apache Sedona project to develop and distribute global data, enabling Overture to embrace modern cloud-native geospatial technologies like GeoParquet. By joining Overture as Contributing Members Wherobots will continue to support the ongoing development, distribution, and evolution of this critical open dataset that enables developers and data practitioners to make sense of the world around us.

Read the announcement blog post

Featured Community Members: Sean Knight & Ilya Marchenko

This month’s featured community members is Ilya Marchenko from YuzuData where he focuses on AI and location intelligence for the commercial real estate industry. Ilya recently wrote a blog post showing how to use Wherobots for a retail cannibalization study. Thanks Sean and Ilya for being a part of the community and sharing how you’re building geospatial products using Wherobots!

Comparing Taco Chains: A Consumer Retail Cannibalization Study With Isochrones

Retail cannibalization analysis with Wherobots

Understanding the impact of opening a new retail location on existing locations is an important analysis in the commercial real estate industry. In this code-heavy blog post the YuzuData team detail a retail cannibalization analysis using WherobotsDB, Overture Maps point of interest data, drive-time isochrones using the Valhalla API, and visualization with SedonaKepler. Sean also presented this analysis earlier this week in a live webinar.

Read the blog post or watch the video recording

Unlock Satellite Imagery Insights With WherobotsAI Raster Inference

One of the most exciting features in Wherobots’ latest release is WherobotsAI Raster Inference which enables running machine learning models on satellite imagery for object detection, segmentation, and classification. This post gives a detailed look at the types of models supported by WherobotsAI and an overview of the SQL and Python APIs for raster inference with an example of identifying solar farms for the purpose of mapping electricity infrastructure.

Read the blog post to learn more about WherobotsAI Raster Inference

Generating Global PMTiles In 26 Minutes With WherobotsDB VTiles

Generating PMTiles with Wherobots VTiles vector tiles generator

WherobotsDB VTiles is a highly scalable vector tile generator capable of generating vector tiles from small to planetary scale datasets quickly and cost-efficiently and supports the PMTiles format. In this post we see how to generate vector tiles of the entire planet using three Overture layers. Using Wherobots Cloud to generate PMTiles of the Overture buildings layer takes 26 minutes. The post includes all code necessary to recreate these tile generation operations and a discussion of performance considerations.

Read the blog post to learn more about WherobotsDB VTiles

Spatial SQL API Brings Performance Of WherobotsDB To Your Favorite Data Applications

The Wherobots Spatial SQL API enables integration with Wherobots Cloud via Python and Java client drivers. In addition to enabling integrations with your favorite data applications via the client drivers, Wherobots has released an Apache Airflow provider for orchestrating data pipelines and an integration with Harlequin, a popular SQL IDE.

Read the blog post to learn more about the Wherobots Spatial SQL API

Wherobots On The Geospatial Index Podcast

William Lyon from Wherobots was recently a guest on The Geospatial Index podcast. In this episode he discusses the origins of Apache Sedona, the open-source technology behind Wherobots, how users are building spatial data products at massive scale with Wherobots, how Wherobots is improving the developer experience around geospatial analytics, and much more.

Watch the video recording

Upcoming Events

Apache Sedona Community Office Hours (Online – August 6th) – Join the Apache Sedona community for updates on the state of Apache Sedona, presentation and demo of recent features, and provide your input into the roadmap, future plans, and contribution opportunities.
GeoMeetup: Cloud Native Spatial Data Stack (San Francisco – September 5th) – Join us on September 5th for an exciting GeoMeetup featuring talks from industry leaders with Wherobots and Felt.com. In this meetup we will be exploring the elements of the cloud native spatial data stack.
FOSS4G NA 2024 (St Louis – September 9th-11th) – FOSS4G North America is the premier open geospatial technology and business conference. Join the Wherobots team for a pre-conference workshop or come by and chat with us at the Wherobots booth to learn about the latest developments in Apache Sedona.

Want to receive this monthly update in your inbox? Sign up for the The Spatial Intelligence Newsletter: