Connect your AI coding assistants to the physical world with Wherobots MCP and CLI Learn More

Introducing the Wherobots Python SDK

Authors

python sdk blog header

What is the Wherobots Python SDK?

The Wherobots Python SDK is a typed Python client for submitting, monitoring, and managing Wherobots job runs. It ships on PyPI as wherobots-python-sdk. One install, one API key, and you’re running spatial jobs from any Python environment: CI/CD pipelines, notebooks, a local shell.

The SDK is built for three workflows. Engineers wiring spatial jobs into production pipelines. Data scientists iterating on a Wherobots script and streaming logs back to the terminal. Ops leads watching what’s running across the organization.

Wherobots Python SDK Use Cases

Wherobots customers run spatial workloads on a cadence: mapping platformsrefreshes OSM-derived road networks every morning before downstream pipelines fire. Ag-tech teams pull Sentinel-2, compute NDVI, and join it to millions of crop boundaries every five days.

Each of these is a Wherobots job that requires uploading scripts to shared storage, wiring up an Airflow DAG, and managing the operator that called the Runs REST API. The Wherobots Python SDK is the shorter path. One install, one API key, and three lines of Python submit a job from any environment that runs Python: a CI/CD pipeline, AWS Step Functions, a notebook, a local shell.

The jobs module of the SDK assembles three workflows customers were already trying to assemble by hand.

Scheduled spatial data products

Mapping and data-provider customers refresh Overture, OSM, or Sentinel-derived datasets on a fixed cadence. The output is a versioned Iceberg or GeoParquet table that downstream teams query directly. With the SDK, the entire refresh is a Python function that runs on cron, GitHub Actions, or AWS EventBridge. No Airflow cluster to host. No operator package to install.

Orchestrator-invoked jobs from Step Functions, Prefect, or Dagster

Several Wherobots customers run their broader ETL in AWS Step Functions or Prefect and treat Wherobots as one task in a longer chain. The SDK gives those orchestrators a clean Python interface to submit a Wherobots job, wait for completion, and pass the output URI to the next step. A single call replaces the boilerplate of presigned uploads, polling, and log retrieval.

CI/CD-driven analytics for production spatial pipelines

Engineers maintaining property climate risk pipelines, mobility joins, or agricultural monitoring jobs want their spatial code to ship through the same review and deploy path as the rest of the codebase. The SDK fits a standard pattern: commit a script to GitHub, run tests in CI, deploy by submitting a job from the runner with WherobotsJob.submit(). Logs stream back to the build output. Failed jobs fail the build.

The rest of this post walks through install, the WherobotsJob API, dependency management, and the security model.


Install the Wherobots Python SDK

pip install wherobots-python-sdk
export WHEROBOTS_API_KEY="your-api-key"

The only runtime dependency is requests. No AWS credentials, no bucket configuration.


The WherobotsJob class

The SDK exposes a single class today: WherobotsJob. Point it at a script, give the job a name, and call .submit().

Submit a job and stream logs

from wherobots import WherobotsJob

job = WherobotsJob(
    script="etl_pipeline.py",
    name="nightly-etl",
    runtime="large",
)

job.submit()
status = job.wait_for_completion(stream_logs=True)
print(f"Finished with status: {status.value}")

That’s the full lifecycle. The SDK uploads local scripts to Wherobots-managed storage via presigned URLs, polls for completion, and streams logs back to your terminal. When wait_for_completion returns, you get a JobStatus enum: COMPLETED, FAILED, or CANCELLED.

Only an API key. No AWS credentials, no bucket setup.

Pass arguments and configuration

Real jobs need arguments, Spark configuration, and dependencies. Pass them through the constructor:

job = WherobotsJob(
    script="spatial_join.py",
    name="q4-spatial-join",
    runtime="x-large-himem",
    timeout_seconds=7200,
    args=["--input", "s3://bucket/parcels/", "--output", "s3://bucket/results/"],
    spark_configs={
        "spark.sql.shuffle.partitions": "200",
        "spark.executor.memory": "8g",
    },
    dependencies=[
        WherobotsJob.add_pypi_dependency("geopandas", "0.14.0"),
        WherobotsJob.add_file_dependency("s3://bucket/libs/custom_udfs.whl"),
    ],
)

The SDK validates inputs at construction time. Bad runtime names, missing JAR main classes, negative disk sizes, and empty scripts all raise WherobotsValidationError before a single network call leaves the client.

List and filter runs

A WherobotsJob instance isn’t required to query your organization’s job runs:

from wherobots import WherobotsJob, JobStatus

page = WherobotsJob.list_runs(
    status=[JobStatus.FAILED],
    name_pattern="etl-*",
    size=10,
)

for run in page.items:
    print(f"{run.id}  {run.name}  {run.status}")

Cancel jobs and handle errors

from wherobots import WherobotsJob, WherobotsTimeoutError

job = WherobotsJob(script="long_running.py", name="cancellable-job")
job.submit()

try:
    status = job.wait_for_completion(max_wait_seconds=600)
except WherobotsTimeoutError:
    job.cancel()
    print("Job cancelled after timeout")

The exception hierarchy is flat. WherobotsAPIError carries the HTTP status code and request ID for debugging. WherobotsValidationError catches bad inputs at construction time. WherobotsTimeoutError fires when max_wait_seconds is exceeded. All three inherit from WherobotsJobError, so a single except block catches everything when you need it to.


Run scripts from your S3 storage integrations

If your script already lives in a Wherobots S3 Storage Integration, reference it directly by S3 URI and skip the upload step:

job = WherobotsJob(
    script="s3://my-integration-bucket/scripts/pipeline.py",
    name="pipeline-job-001",
    runtime="small",
    auto_upload=False,
)

Discover your integration paths programmatically:

from wherobots.api.files import FilesAPI
from wherobots.config import WherobotsConfig

config = WherobotsConfig.from_env()
with FilesAPI.from_config(config) as files_api:
    for si in files_api.list_integrations():
        print(f"{si.name}: {si.path} ({si.region})")

Design Principles

The SDK is opinionated in four ways:

  • requests is the only runtime dependency. No Pydantic, no boto3, no heavy frameworks. Install footprint stays small.
  • Presigned uploads only. Local scripts go up through Wherobots API presigned URLs. No AWS credentials in user code, ever.
  • Type-safe throughout. Typed dataclass models for every API response, JobStatus and Runtime enums, and a PEP 561 py.typed marker for downstream mypy users.
  • Security-first defaults. HTTPS only (HTTP rejected at init). No redirect following, which prevents auth header leaks. API keys masked in repr(). Path traversal defense on uploads. POST requests are never retried.

Getting Started

pip install wherobots-python-sdk
export WHEROBOTS_API_KEY="your-api-key"

# Submit a job and watch it run
python -c "
from wherobots import WherobotsJob
job = WherobotsJob(script='my_script.py', name='first-job', runtime='tiny')
job.submit()
job.wait_for_completion(stream_logs=True)
"

Source: github.com/wherobots/wherobots-python-sdk.

Available now on PyPI: pip install wherobots-python-sdk.

Start Building with Wherobots