Raster Inference – Bring Your Own Model

Wherobots Raster Inference supports running your own machine learning models on raster images in order to gather insights using the Machine Learning Model Extension Specification (MLM). MLM is the standard for discovering, sharing, and running machine learning models for geospatial data.

Generally, bringing your own model involves the following steps:

Saving your model checkpoint using Torchscript (through either scripting or tracing).
Choosing an S3 bucket to store your model.
Uploading your Torchscript model to your S3 bucket.
Filling out two MLM Specification forms (the Asset form and the MLM form) for your model.
Uploading the MLM JSON file to your S3 bucket.
Executing raster inference.
Analyzing your model inference results.

Capabilities

WherobotsAI Raster Inference currently supports:

The following computer vision tasks:
- Single-label scene classification
- Object detection
- Semantic segmentation
- Segment Anything 2 (text prompt to polygons)
Workloads with single input tensor and single output tensor
NVIDIA GPU acceleration
Pytorch export formats: Torchscript models, ExportedPrograms, and AOTInductor models

Job Runs

You can complete raster inference with WherobotsAI within a Job Run or as a Wherobots Notebook.

This example discusses how to complete raster inference within a Wherobots Notebook. To complete this as a Job Run, the code samples referenced in subsequent sections would go into a single Python file and then be executed as a Job Run. For more information on creating Job Runs in Wherobots, see WherobotsRunOperator.

Before You Start

Before attempting to use your own machine learning model in WherobotsAI Raster Inference, ensure that you have the following:

A Professional Edition Wherobots Organization.
- Log in to Wherobots Cloud to follow along in an interactive Wherobots notebook or complete these steps for your own model in a new notebook.
A PyTorch model file.
An Amazon S3 Bucket or a Wherobots Managed Storage for storing your MLM JSON file.

Save and Upload Your Model

Save your model checkpoint using Torchscript. For more information, see Saving and Loading Models in the PyTorch documentation.

The following Torchscript model checkpoint saving methods are supported:

Artifact Type	Description
`torch.jit.script`	A model artifact obtained by `TorchScript`.
`torch.export.save`	A Pytorch model archive containing an artifact of type `AOTInductor` or `ExportedProgram`.

!!! note WherobotsAI Raster Inference currently only supports PyTorch models.

Store your model in an S3 bucket. This S3 bucket needs to be accessible to Wherobots Cloud. You can choose to store your model in one of two ways:
1. Directly in Wherobots Managed Storage. For more information, see Wherobots storage and notebook guidance.
2. Integrate your existing Amazon S3 storage with Wherobots. For more information on integrating a public or private S3 bucket with Wherobots Cloud see, S3 storage integration.

In this example, we’ll store our model using Wherobots Managed Storage and create a data/customer-XXXX/bring-your-own-model directory.

!!! note This example uploads the model to Wherobots Managed Storage but you can also use your model through integrated storage. For more information, see S3 storage integration in the Wherobots documentation.

The URI to this model is used to create an MLM JSON in the subsequent step.

Create an MLM JSON for Your Model

MLM specification overview

The Machine Learning Model Extension Specification (MLM) is based on the SpatioTemporal Asset Catalog’s (STAC) standardized MLM. MLM defines a JSON format that specifies a model’s properties, input and input processing requirements, and output and output processing requirements.

MLM creates a standardized way to use your own models for inference. MLM accomplishes this by:

Enabling the building of searchable custom models and their associated STAC datasets.
Recording all necessary bands, parameters, modeling artifact locations, and high-level processing steps to deploy an inference service.
Creating an easy and standardized way to use your own models for inference.

MLM specification forms

To create an MLM JSON for your model, first fill out the Model Asset Form in the Asset Form tab and then fill out the Model Metadata form in the MLM Form tab.

!!! info You must fill out the Asset Form before the MLM Form.

Fill out Asset Form

To fill out the Model Asset Form, do the following:

Go to the Machine Learning Model Metadata Form site.
Go to the Asset Form tab.
Fill in the MLM Model Asset Form with your model information in accordance with the following chart. For compatibility with Raster Inference, you only need to specify the URI to the model artifact. For additional information and metadata fields you may want to document for your model, see Model Asset in Machine Learning Model Extension Specification.

Field Name Type Required or optional Description

Title string Optional Name of model asset

URI string Required S3 URI to your saved Torchscript model.

Fill out MLM form

Field Name	Type	Required or optional	Description
Title	string	Optional	Name of model asset
URI	string	Required	S3 URI to your saved Torchscript model.

To create the MLM JSON your model, do the following:

Within the Machine Learning Model Metadata Form site, go to the MLM Form tab.
- This form validates your input formats so that they conform to the MLM specification. For clarity, we’ve specified a few fields for reference below. For a full breakdown of the inputs and definitions, see Item Properties and Collection Fields in the Machine Learning Model Extension Specification.
MLM metadata form field Expected Input Example Input

Is it pretrained? true or false true

Categories List of classes for your model “Solar panels”, “Wind farms”, “Forests”
Click Download JSON to save the JSON file.

MLM metadata form field	Expected Input	Example Input
Is it pretrained?	true or false	true
Categories	List of classes for your model	“Solar panels”, “Wind farms”, “Forests”

Here is a reference MLM for the landcover-eurostat-sentinel2 Wherobots hosted model.

Upload your model’s MLM JSON

We created an MLM JSON for the Torchscript model by following the steps in Create an MLM JSON for Your Model.

Upload the JSON to the same S3 bucket as the Torchscript model in Wherobots.

The path to the MLM JSON will be the user_mlm_uri in the rest of the example.

Run Inference Using Your Model on Raster Data

Currently, WherobotsAI Raster Inference supports running model inference on the following tasks:

Single-label scene classification
Object Detection
Semantic Segmentation
Text to Bounding Boxes
Text to Instance Segments

The following chart details the WherobotsAI Raster Inference function calls to use for each Computer Vision task.

Computer Vision Task	SQL API	Python API	Walk Through Example
Image Classification	`RS_Classify()`	`rs_classify()`	Run inference for classification
Object Detection	`RS_Detect_BBoxes()`	`rs_detect()`	Run inference for Object Detection
Semantic Segmentation	`RS_Segment()` , `RS_Segment_to_Geoms()`	`rs_segment()`	Run inference for Semantic Segmentation
Instance Segmentation	`RS_Text_To_Segments()` ,	`rs_text_to_segments()`	Run inference for Segment Anything 2

Semantic Segmentation example

In the following example, we’ll discuss how to use your own model for Raster Inference in Wherobots by performing Semantic Segmentation (also referred to as pixel classification) to identify solar farms in Arizona.

This example uses:

A Pytorch Archive model fine-tuned from the Satlas model ¹ on Sentinel-2 multispectral satellite imagery to identify solar farms
An MLM JSON derived from the Satlas model documentation for this task
A set of new Sentinel-2 multispectral satellite images sampled from the Satlas dataset

!!! note This example is also available to walk through in examples/Analyzing-Data/Bring_Your_Own_Model_Raster_Inference.ipynb once you launch a Wherobots Notebook instance.

To use your model for Semantic Segmentation, follow the steps in the subsequent sections to configure the MLM path, load the Torchscript model, and run Raster Inference on the new dataset.

Start a notebook

To start a notebook to run raster inference with WherobotsAI, do the following:

Log in to Wherobots Cloud.
Start a Wherobots instance. We recommend using the Tiny-GPU runtime. It can take several minutes for a runtime to load.
Open a Python notebook.
1. To interact with this example yourself, open examples/Analyzing-Data/Bring_Your_Own_Model_Raster_Inference.ipynb.
2. If you are incorporating you own model, create a new notebook. If using your own model, use the code samples in this tutorial as a guide. !!! note If you add the S3 storage integration after starting the notebook, you must restart the notebook in order to access to the newly added storage integration.

For more information on starting a notebook, see Notebook instance management and Jupyter Notebook Management.

Set Up The Sedona Context

The following code creates the SedonaContext:

import warnings
warnings.filterwarnings('ignore')
import os

from sedona.spark import *
from pyspark.sql.functions import expr

config = (
    SedonaContext.builder().appName('segmentation-batch-inference')
    .getOrCreate()
)

sedona = SedonaContext.create(config)

Create the URI variable

Next, we need to set the user_mlm_uri path to the S3 URI of the MLM JSON that we created in Upload your model’s MLM JSON.

WherobotsAI Raster Inference uses user_mlm_uri to get the necessary processing information for the model and know which model to use to run inference.

To get the S3 URI of the MLM JSON:

Navigate to the MLM JSON in Wherobots Cloud.
Copy/paste the location of the file and set it to user_mlm_uri.

user_mlm_uri = [PATH-TO-YOUR-MLM-JSON]

Load Satellite Imagery

Load the satellite imagery that we will be running inference over. These GeoTiff images are loaded as out-db rasters in WherobotsDB, where each row represents a different scene.

tif_folder_path = 's3a://wherobots-benchmark-prod/data/ml/satlas/'
df_raster_input = sedona.read.format("raster").load(f"{tif_folder_path}/*.tif").sample(.05)
df_raster_input.show(truncate=False)

Run Predictions And Visualize Results

Raster Inference SQL function RS_Segment

To run predictions, specify the MLM model metadata file we saved to user_mlm_uri.

Predictions can be run with this Raster Inference SQL function, RS_Segment or the Python API.

Here we generate 400 raster predictions using RS_Segment.

predictions_df = sedona.sql(f"""
SELECT
  rast,
  segment_result.*
FROM (
  SELECT
    rast,
    RS_SEGMENT('{user_mlm_uri}', rast) AS segment_result
  FROM
    df_raster_input
) AS segment_fields
""")

predictions_df.cache().count()
predictions_df.show()
predictions_df.createOrReplaceTempView("predictions")

Using the wherobots.inference Python API

For those who prefer working with Python, wherobots.inference provides a module to register SQL inference functions as Python functions.

To use this module, replace the code in Raster Inference SQL function RS_Segment with the following code sample:

from wherobots.inference.engine.register import create_semantic_segmentation_udfs
from pyspark.sql.functions import col
rs_segment =  create_semantic_segmentation_udfs(batch_size = 9, sedona=sedona)
df = df_raster_input.withColumn("segment_result", rs_segment(user_mlm_uri, col("rast"))).select(
                               "rast",
                               col("segment_result.confidence_array").alias("confidence_array"),
                               col("segment_result.class_map").alias("class_map")
                           )
df.show(3)

Extract insights

Initial results

Now that we’ve generated predictions using our model over our satellite imagery, we can use the RS_Segment_To_Geoms function to extract geometries from the classified imagery pixels.

These geometries delineate the boundaries of possible solar farms and contain the average model confidence scores of the pixels contained within them.

df_multipolys = sedona.sql("""
    WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(rast, confidence_array, array(1), class_map, 0.65) result
        FROM predictions
    )
    SELECT result.* FROM t
""")

df_multipolys.cache().count()
df_multipolys.show()
df_multipolys.createOrReplaceTempView("multipolygon_predictions")

We’ll specify the following:

rast: A raster column to use for georeferencing our results
confidence_array: The prediction result from the previous step
array(1): Our category label “1” returned by the model representing Solar Farms
class_map: Class map to use for assigning labels to the prediction
0.65: A confidence threshold between 0 and 1 to use to threshold classified pixels from the model.

Filtered results

Since we ran inference across the entire state of Arizona, many scenes don’t contain solar farms and as a result, don’t have positive detections.

Let’s filter out scenes without segmentation detections so that we only retain the positive results.

df_merged_predictions = sedona.sql("""
    SELECT
        element_at(class_name, 1) AS class_name,
        cast(element_at(average_pixel_confidence_score, 1) AS double) AS average_pixel_confidence_score,
        ST_Collect(geometry) AS merged_geom
    FROM
        multipolygon_predictions
""")

This leaves us with a few predicted solar farm polygons for our 400 satellite image samples.

df_filtered_predictions = df_merged_predictions.filter("ST_IsEmpty(merged_geom) = False")
df_filtered_predictions.cache().count()
df_filtered_predictions.show()

Visualize with SedonaKepler

We’ll plot these filtered results with SedonaKepler. Compare the satellite basemap with the predictions and see if there’s a match!

!!! note This basemap is compiled from images taken at different times. This means the features shown on the basemap might not match the imagery we just used for our analysis.

from sedona.spark import *
config = {
    'version': 'v1',
    'config': {
        'mapStyle': {
            'styleType': 'dark-matter',
            'topLayerGroups': {},
            'visibleLayerGroups': {},
            'mapStyles': {}
        },
    }
}
map = SedonaKepler.create_map(config=config)

SedonaKepler.add_df(map, df=df_filtered_predictions, name="Solar Farm Detections")
map

Bastani, Favyen, Wolters, Piper, Gupta, Ritwik, Ferdinando, Joe, and Kembhavi, Aniruddha. “SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding.” arXiv preprint arXiv:2211.15660 (2023). https://doi.org/10.48550/arXiv.2211.15660