Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Wherobots customers are realizing up to a 20x performance increase and significant cost savings by shifting their Apache Sedona workloads into Wherobots. This guide shows you how easy it is to migrate Apache Sedona workloads into WherobotsDB, and focuses on best practices for Apache Sedona migrations from Amazon EMR, AWS Glue, and Databricks.
By following this guide, you’ll be able to:
This guide assumes you’ve already decided to migrate to Wherobots. We’ll focus on the technical steps in moving your workloads, empowering you to get up and running quickly.
Wherobots makes it easy to run the models and scripts you already have in your public or private Amazon S3 buckets using Wherobots’ secure S3 storage integration.
For step-by-step configuration instructions, see S3 Storage integration and SAML Single Sign On (SSO) setup in the official Wherobots documentation.
When you’re ready to dive into spatial data analysis within WherobotsDB, your first order of business is creating a Sedona context object. This object acts as your gateway to the capabilities of the Wherobots Cloud ecosystem, enabling you to leverage its extensive spatial functions and tools.
To ensure a smooth start, it’s essential to double-check that your Sedona environment is correctly configured within your Wherobots notebook. Pay close attention to the sedona and spark variables used to initialize the Sedona environment, ensuring they match your existing setup and preferences. This approach will help you avoid potential hiccups and ensure a seamless transition into the world of spatial analysis with WherobotsDB.
sedona
spark
from sedona.spark import * config = SedonaContext.builder()\ # add your Sedona/Spark configurations here in this format .config("<sedona-spark-config-key>", "<sedona-spark-config-value>")\ .getOrCreate() sedona = SedonaContext.create(config)
This configuration provides the foundation for utilizing Sedona’s spatial functions within WherobotsDB, empowering you to perform advanced geospatial analysis with ease and efficiency.
Workload migration can be daunting and disruptive. Fortunately, WherobotsDB is built on Apache Sedona and is 100% code-compatible, so you can migrate your workloads seamlessly. You’ll find all the familiar functions, joins, and features of Sedona, performance enhanced by WherobotsDB.
Follow the steps below to seamlessly transfer your business logic to WherobotsDB:
This diagram illustrates how your Sedona workloads will be integrated within the Wherobots ecosystem:
Start by identifying an obvious component of your spatial workflow to migrate to WherobotsDB. Ensure you have all the supporting elements required for its functionality. This approach will streamline your transition to the WherobotsDB ecosystem.
After identifying the business logic you intend to shift, it’s important to validate its functionality within the WherobotsDB environment to ensure it performs as expected. This validation process ensures that your spatial operations, data transformations, and analytical processes produce the same accurate results you rely on.
Test your code using WherobotsDB notebooks. Start by selecting a runtime for your notebook that aligns with the demands of your workload. Then, seamlessly transfer your business logic into the notebook environment. Execute your code and carefully validate the outputs, paying close attention to data counts and consistency with your expected results. This validation process ensures that your logic functions seamlessly within WherobotsDB.
With your validated code ready, it’s time to package it into a Python script. This involves simply creating a .py file and organizing your code. This step ensures your logic is portable and easily executed within the WherobotsDB environment.
.py
Now that your business logic is neatly packaged within a Python script, you need to make it accessible to Wherobots Airflow. To do this, you can upload it to an S3 bucket that’s integrated with your Wherobots environment.
Another alternative is to upload it to our Managed Storage. This secure and integrated storage solution ensures your code is readily available for execution within the Wherobots ecosystem. Click here on how to upload to Managed Storage.
Wherobots provides an Airflow operator called the WherobotsRunOperator to simplify the integration of your code with the Job Runs API. This operator, designed for Apache Airflow, allows you to seamlessly trigger your Wherobots runs within your Airflow workflows. Before running your script, you’ll need to establish a connection to Wherobots in the Airflow Server and retrieve the S3 URI of your uploaded Python file. This URI serves as a reference to your code’s location, enabling the Wherobots Airflow operator to access and execute it.
WherobotsRunOperator
Here’s an example of how to use the WherobotsRunOperator to execute your Sedona code on WherobotsDB:
import datetime import pendulum from airflow import DAG from airflow_providers_wherobots.operators.run import WherobotsRunOperator from wherobots.db.runtime import Runtime with DAG( dag_id="test_run_operator", schedule="@once", start_date=pendulum.datetime(2021, 1, 1, tz="UTC"), catchup=False, tags=["example"], ) as test_run_dag: operator = WherobotsRunOperator( task_id="analysis_task", name="airflow_run_operator", runtime=Runtime.TINY, run_python={ "uri": "S3-URI-PATH-TO-YOUR-FILE", "args": "test_run=True" }, dag=test_run_dag, poll_logs=True, )
In this example, the WherobotsRunOperator takes the S3 URI of your Python file and executes it on a specified runtime environment Runtime.TINY. You can configure the Airflow to run your code on a schedule, pass arguments to your code, and monitor the execution logs.
Runtime.TINY
By utilizing the WherobotsRunOperator and the Job Runs API, you can seamlessly integrate your existing Sedona code into WherobotsDB and take advantage of its powerful geospatial capabilities. This approach ensures a smooth transition and allows you to focus on your spatial data analysis without worrying about infrastructure management or complex configurations.
To learn more about the WherobotsRunOperator and its capabilities, refer to the Wherobots documentation.
If you don’t use Airflow, the Wherobots Jobs Runs API provides a convenient way to execute your code directly on WherobotsDB.
Migrating your spatial data workflows doesn’t have to be a complex endeavor. With this guide, you can easily transition from Apache Sedona on Spark, EMR, or Databricks and leverage WherobotsDB on Wherobots Cloud, a cloud-native data processing solution designed to make you more productive and your spatial workloads accelerate.
Ready to simplify your spatial data analysis?
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
How well does SAM3 detect building footprints? Let’s ask the Wherobots Spatial AI Assistant!
In a recent post, we showed how easy it is to use RasterFlow and Meta’s Segment Anything 3 Model (SAM3) to detect features in the physical world. A single end-to-end pipeline built a 133 GB NAIP mosaic of Marion County, Oregon, ran SAM3 against it with text prompts spanning eight classes, and produced approximately one […]
Wherobots MCP Server: Building GEOINT Spatial Pipelines with AI Agents
I built three national-security GEOINT use cases on the Wherobots stack in days instead of weeks. A Critical Infrastructure Vulnerability (CIV) pipeline with two regional variants, plus a border-corridor analysis on real transportation segments. The Wherobots geospatial MCP server is what made that timeline possible. Most of the work in standing up a credible use […]
Change Detection Using AlphaEarth Foundations (Part 2)
Continue exploring how Alpha Earth Embeddings reveal change over time using scores.
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: