Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
The Wherobots Python SDK is a typed Python client for submitting, monitoring, and managing Wherobots job runs. It ships on PyPI as wherobots-python-sdk. One install, one API key, and you’re running spatial jobs from any Python environment: CI/CD pipelines, notebooks, a local shell.
wherobots-python-sdk
The SDK is built for three workflows. Engineers wiring spatial jobs into production pipelines. Data scientists iterating on a Wherobots script and streaming logs back to the terminal. Ops leads watching what’s running across the organization.
Wherobots customers run spatial workloads on a cadence: mapping platformsrefreshes OSM-derived road networks every morning before downstream pipelines fire. Ag-tech teams pull Sentinel-2, compute NDVI, and join it to millions of crop boundaries every five days.
Each of these is a Wherobots job that requires uploading scripts to shared storage, wiring up an Airflow DAG, and managing the operator that called the Runs REST API. The Wherobots Python SDK is the shorter path. One install, one API key, and three lines of Python submit a job from any environment that runs Python: a CI/CD pipeline, AWS Step Functions, a notebook, a local shell.
The jobs module of the SDK assembles three workflows customers were already trying to assemble by hand.
Mapping and data-provider customers refresh Overture, OSM, or Sentinel-derived datasets on a fixed cadence. The output is a versioned Iceberg or GeoParquet table that downstream teams query directly. With the SDK, the entire refresh is a Python function that runs on cron, GitHub Actions, or AWS EventBridge. No Airflow cluster to host. No operator package to install.
Several Wherobots customers run their broader ETL in AWS Step Functions or Prefect and treat Wherobots as one task in a longer chain. The SDK gives those orchestrators a clean Python interface to submit a Wherobots job, wait for completion, and pass the output URI to the next step. A single call replaces the boilerplate of presigned uploads, polling, and log retrieval.
Engineers maintaining property climate risk pipelines, mobility joins, or agricultural monitoring jobs want their spatial code to ship through the same review and deploy path as the rest of the codebase. The SDK fits a standard pattern: commit a script to GitHub, run tests in CI, deploy by submitting a job from the runner with WherobotsJob.submit(). Logs stream back to the build output. Failed jobs fail the build.
WherobotsJob.submit()
The rest of this post walks through install, the WherobotsJob API, dependency management, and the security model.
WherobotsJob
pip install wherobots-python-sdk export WHEROBOTS_API_KEY="your-api-key"
The only runtime dependency is requests. No AWS credentials, no bucket configuration.
requests
The SDK exposes a single class today: WherobotsJob. Point it at a script, give the job a name, and call .submit().
.submit()
from wherobots import WherobotsJob job = WherobotsJob( script="etl_pipeline.py", name="nightly-etl", runtime="large", ) job.submit() status = job.wait_for_completion(stream_logs=True) print(f"Finished with status: {status.value}")
That’s the full lifecycle. The SDK uploads local scripts to Wherobots-managed storage via presigned URLs, polls for completion, and streams logs back to your terminal. When wait_for_completion returns, you get a JobStatus enum: COMPLETED, FAILED, or CANCELLED.
wait_for_completion
JobStatus
COMPLETED
FAILED
CANCELLED
Only an API key. No AWS credentials, no bucket setup.
Real jobs need arguments, Spark configuration, and dependencies. Pass them through the constructor:
job = WherobotsJob( script="spatial_join.py", name="q4-spatial-join", runtime="x-large-himem", timeout_seconds=7200, args=["--input", "s3://bucket/parcels/", "--output", "s3://bucket/results/"], spark_configs={ "spark.sql.shuffle.partitions": "200", "spark.executor.memory": "8g", }, dependencies=[ WherobotsJob.add_pypi_dependency("geopandas", "0.14.0"), WherobotsJob.add_file_dependency("s3://bucket/libs/custom_udfs.whl"), ], )
The SDK validates inputs at construction time. Bad runtime names, missing JAR main classes, negative disk sizes, and empty scripts all raise WherobotsValidationError before a single network call leaves the client.
WherobotsValidationError
A WherobotsJob instance isn’t required to query your organization’s job runs:
from wherobots import WherobotsJob, JobStatus page = WherobotsJob.list_runs( status=[JobStatus.FAILED], name_pattern="etl-*", size=10, ) for run in page.items: print(f"{run.id} {run.name} {run.status}")
from wherobots import WherobotsJob, WherobotsTimeoutError job = WherobotsJob(script="long_running.py", name="cancellable-job") job.submit() try: status = job.wait_for_completion(max_wait_seconds=600) except WherobotsTimeoutError: job.cancel() print("Job cancelled after timeout")
The exception hierarchy is flat. WherobotsAPIError carries the HTTP status code and request ID for debugging. WherobotsValidationError catches bad inputs at construction time. WherobotsTimeoutError fires when max_wait_seconds is exceeded. All three inherit from WherobotsJobError, so a single except block catches everything when you need it to.
WherobotsAPIError
WherobotsTimeoutError
max_wait_seconds
WherobotsJobError
except
If your script already lives in a Wherobots S3 Storage Integration, reference it directly by S3 URI and skip the upload step:
job = WherobotsJob( script="s3://my-integration-bucket/scripts/pipeline.py", name="pipeline-job-001", runtime="small", auto_upload=False, )
Discover your integration paths programmatically:
from wherobots.api.files import FilesAPI from wherobots.config import WherobotsConfig config = WherobotsConfig.from_env() with FilesAPI.from_config(config) as files_api: for si in files_api.list_integrations(): print(f"{si.name}: {si.path} ({si.region})")
The SDK is opinionated in four ways:
Runtime
py.typed
mypy
repr()
pip install wherobots-python-sdk export WHEROBOTS_API_KEY="your-api-key" # Submit a job and watch it run python -c " from wherobots import WherobotsJob job = WherobotsJob(script='my_script.py', name='first-job', runtime='tiny') job.submit() job.wait_for_completion(stream_logs=True) "
Source: github.com/wherobots/wherobots-python-sdk.
Available now on PyPI: pip install wherobots-python-sdk.
pip install wherobots-python-sdk
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Change Detection Using AlphaEarth Foundations (Part 2)
Continue exploring how Alpha Earth Embeddings reveal change over time using scores.
AlphaEarth Embeddings, Zonal Statistics, and PCA
Aggregate AlphaEarth embeddings over Iowa fields and visualize them with PCA.
Detecting Objects From Text Prompts with RasterFlow and Segment Anything 3
Exploring the capabilities of Segment Anything 3 on high-resolution Earth observation data.
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: