Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence LEARN MORE

Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence

Four aerial images of crop fields, numbered 1 through 4

We’re very excited to announce RasterFlow is now available to select customers in a private preview. If you are interested in learning more or would like to request access to the preview, contact us here!

RasterFlow is a serverless image preparation and inference engine that makes it significantly easier to generate Earth Intelligence from planetary scale Earth Observation (EO) datasets. With it, customers and their AI agents will be significantly more capable of innovating with EO data and integrating earth insights into their data infrastructure.

How RasterFlow Powers Earth Intelligence at Scale

A few weeks ago, we announced our collaboration with the Taylor Geospatial Engine to help them evaluate their Fields of the World (FTW) machine learning model that segments agricultural field boundaries. Using an early release of RasterFlow, we were able to quickly and cost-effectively run this model at scale.

Here’s a breakdown of how this works in practice. RasterFlow ingests and assembles the source imagery – in this case Sentinel-2 – into an inference-ready mosaic, generating representative features using the FTW model for planting and harvest seasons, and removing cloud cover as needed (1, 2). The FTW model is run against this mosaic using RasterFlow’s distributed inference engine to predict fields and field boundaries (3). RasterFlow predictions are then vectorized into geometries and made available as an Iceberg table (4) that can be used in WherobotsDB or other downstream applications and data systems for field-level crop insights. RasterFlow’s applicability is much wider than Sentinel 2 and FTW. It supports Zarr and COG imagery datasets and PyTorch computer vision models for inference.

Four aerial images of crop fields, numbered 1 through 4

RasterFlow at Scale
The images above represent sample outputs for a small area in Kansas, but RasterFlow can be very attractive for larger scale runs.  In our collaboration with the Taylor Geospatial Engine, we executed larger scale runs including the Continental United States (CONUS), Japan, Mexico, South Africa, Switzerland and Rwanda.  RasterFlow’s efficient parallel processing enabled each of these large scale workflows to complete in minutes to a few hours.  RasterFlow autoscales compute resources based on expected compute and inference load, which is a function of area and time range, dataset density, and model complexity.

Challenges using EO Data

Most data teams do not have the expertise or the budget to build and operate the unique infrastructure and software stack required to extract insights from EO datasets using computer vision models. These barriers have prevented innovative ideas from getting off the ground.

According to Gartner, only 1% of AI models today leverage physical world data, vs a projected 80% by 2029. Similarly, AI agents are projected to generate 10 times more data from physical environments than from all digital AI applications combined.1 However, AI agents can’t economically make sense of this raw data because it has to be prepared by the same costly, complex, and unique infrastructure the data teams need, but neither have access to.

Here’s an example that underscores these challenges: if you or an AI agent are trying to analyze wildfire state and predicted spread to measure risk to infrastructure, developers typically need to build dedicated pipelines that:

  • Ingest and prepare imagery for inference, minimizing noise such as cloud cover and edge effects
  • Deploy a machine learning model on prepared imagery, trained to segment and classify fires
  • Tune model inference for scale and efficiency, while minimizing edge and tiling effects from individual tasks
  • Measure change over time using models that take into account wind direction, speed, vegetation, buildings and other infrastructure in the probable path of the fire
  • Join model predictions with other important context including building footprints, land parcels and infrastructure such as powerlines and pipelines to calculate overall risk
  • Forecast the spread of the fire

In total, these steps require significant investments in both infrastructure development, operations, and talent that most businesses are unable to justify, much even accomplish.

On-Demand Imagery Preparation and Inference for Earth Observation Workflows

The inspiration for RasterFlow was to make it easy for any company to use large scale sensor datasets and computer vision models to unblock innovation and AI applications for the physical world. RasterFlow does this by combining decades of expertise with a fully managed, inference and mosaicking workflow and API designed for Earth Intelligence at any scale.

Here are a few key capabilities:

  • On-demand serverless operations for imagery ingestion, preparation (also known as mosaicking), and inference.
  • Built-in support for popular open datasets and open models so you can get started quickly.
  • Inference results that can be converted to vector geometries and integrated into a lakehouse architecture; in a customer’s cloud storage bucket as Parquet files in Apache Iceberg tables.
  • Ability to easily postprocess these results with WherobotsDB or other lakehouse engines with support for spatial operations, such as Databricks, Snowflake, or Google BigQuery.
  • Simple enough for any engineer, scientist, or analyst to use: just pick a model, an area of interest to deploy that model, and a time range. Advanced users can take advantage of lower-level APIs to customize their planetary-scale inference runs.

RasterFlow Operators: Core Functions for Preparing Imagery, Model Inference, and Vectorization

RasterFlow provides fully managed operations required for processing Earth Observation datasets, including:

  1. Imagery ingestion and preparation to remove cloud cover, edge effects, and build a high quality inference-ready mosaic
  2. Distributed inference for large scale computer vision, geospatial foundational and other PyTorch model runs
  3. Vectorization of model outputs into geometries or as analytics ready rasters

Ingesting and Preparing Satellite Imagery for Model Inference

Satellites and drones capture imagery on a particular flight path. And it may take multiple drone flights, or days, weeks, or even months for the flight paths of a satellite constellation to capture clean imagery for a particular area of interest. Clouds and weather events may still block what you may be interested in. In these circumstances it’s important to understand the rate of coverage and define your time horizon accordingly, to build a mosaic.

A mosaic is a composite image that is the result of composing high-quality pixels (e.g., cloud free) over a time range, and stitching them together for a particular area. Base satellite layers in your favorite map applications (Google Maps, Mapbox) are cloud-free mosaics composed from images over a wide time range. Many computer vision models are trained to find relatively durable things on Earth, like buildings, roads, and land cover. But when clouds, coverage, imagery edge effects, or other types of “noise” exist in the input imagery, the quality of inference suffers. The purpose of the mosaic is to correct for this noise and make imagery, inference-ready, so model inference produces the results you want. RasterFlow takes care of this heavy lifting for you, creating an inference-ready mosaic that maximizes the usefulness of today’s Earth Observation models.

Distributed Geospatial Inference at Planetary Scale

We’ve moved past the use of eyes to analyze imagery, and are now capable of letting machines do this work for us. With RasterFlow, today’s machine learning models can perform tasks such as object detection, segmentation, and classification, on a very large area of interest, with orders of magnitude more efficiency and scale than an analyst’s eyes can offer. The RasterFlow inference engine is designed for small to very large scale runs. It efficiently parallelizes across the input mosaic across a distributed and serverless inference architecture while minimizing tiling effects typically produced when inference pipelines operate on individual tiles.

Running Hosted or Custom Geospatial AI Models with RasterFlow

For convenience, RasterFlow currently hosts popular open source PyTorch geospatial computer vision models that are ready to use. These models currently include:

You can also import your own custom PyTorch model to your Wherobots Organization for private deployment.

RasterFlow + TorchGeo: Simplifying PyTorch-Based Geospatial AI

Wherobots actively supports the TorchGeo project which helps machine learning experts to more easily work with geospatial data within the PyTorch ecosystem.  We will continue to build out RasterFlow integrations with TorchGeo, including onboarding additional TorchGeo models and further simplifying the model lifecycle for PyTorch models. While we are starting with support for PyTorch focusing on TorchGeo models, we are open to adding support for other model frameworks.

Calling Geospatial Model Developers: Contribute to RasterFlow

We are continually adding new, open source geospatial computer vision models to the Wherobots Model Hub. And if you’re a model developer, we’re interested in speaking with you to onboard your model and distribute the value of your work to a wider audience using Wherobots RasterFlow.

Vectorizing Model Outputs: From Raster Predictions to Geospatial Geometries

Many computer vision models output rasters, where each pixel in the raster represents a predicted real-world value such as height of the tree canopy, or the confidence that the pixel represents a certain feature such as an agricultural field boundary or a sidewalk. RasterFlow provides built-in support for raster vectorization, turning pixel values into rich, concise geometries. These geometries represent features of interest that can be post-processed, conflated, and integrated into your workflows because they are yours, stored in open source file (Parquet) and table (Iceberg) formats in your S3 bucket.

Using RasterFlow with Geospatial Foundation Models and Embeddings

Recent developments in Geospatial Foundation Models have generated tremendous interest in the research community, potentially accelerating Earth Observation applications the same way that Large Language Models (LLMs) and embeddings have transformed AI’s ability to generate language. RasterFlow can generate embeddings from the latest open Geospatial Foundation Models, including OlmoEarth from the Allen Institute for AI (Ai2) and Clay.  With RasterFlow’s ability to cost-effectively generate embeddings at scale, researchers and practitioners can easily generate embeddings for their area of interest and evaluate their suitability and power.

Customers and Partners Using RasterFlow for Scalable Earth Intelligence

One highlight while developing RasterFlow has been our collaboration with customers and partners like SatSure, Taylor Geospatial Engine, and Spyrosoft. We’ve used feedback from these teams to ensure we are solving for customer needs. Before the Thanksgiving holiday we shared our recent learnings from working together with Taylor Geospatial Engine, who have been incredibly helpful in providing input on the types of ways their ecosystem of developers and ML engineers would want to interact with RasterFlow.

SatSure is an existing Wherobots customer and an early adopter of RasterFlow, and we are excited to see what they build next with it.

"RasterFlow meaningfully accelerates the work SatSure and Wherobots already do together. By automating mosaicking, preprocessing, and distributed inference into a single, on-demand workflow, it removes much of the engineering overhead required to operationalize our models at national and multi-season scale. This helps us move new geospatial AI models into production faster, iterate more quickly with customers, and deliver fresher, high-resolution insights across agriculture, banking and financial services, and infrastructure use cases."
Rashmit Singh

CTO and co-founder, SatSure

RasterFlow Availability and Multi-Cloud Architecture

Wherobots infrastructure runs natively on AWS and customers pay for use through the AWS marketplace. RasterFlow and WherobotsDB support hybrid architectures, where data is read from, and results are written to other environments such as GCP, Azure, Oracle, or on-premises. This is particularly useful when processing open datasets or using open models and the environment in which data is processed may not be a concern. On-demand pricing for RasterFlow will be announced at a later date, but can be discussed with customers participating in the private preview.

Next Steps: Try RasterFlow and Explore the Wherobots Spatial Data Platform

  1. Source – 27 August 2025, Gartner Innovation Insight: World Models Are Set to Empower AI Agents With Imagination ↩︎