5 Mins Read

10 Apr 2026

Take-aways from the 2026 Geospatial Embeddings Workshop at Clark University

Authors

Ryan Avery

EO Embeddings have a lot of excitement (and hype) around them. My social media is filled with content about EO embeddings and GeoML communities like the TorchGeo Slack and GitHub are focused on onboarding “foundation models” and associated assets like training datasets. Indie hackers like Christopher Ren are building useful applications for exploring and comparing embeddings, see GeoVibes.

This work is all fantastic. At the same time, I’m still left with a sense that fundamental issues around access and enablement have not been addressed, and that the performance and fitness for use of EO embedding products has not been adequately communicated.

In early March, I got together with industry and academic experts at the Geospatial Embeddings Workshop at Clark University to start addressing this. We discussed standards for storage, documentation, and cataloging and implemented some of these ideas, which can be found at github.com/geo-embeddings.

Workshop Outcomes

We now have a site to collect best practices, document standards, and showcase tutorials for EO Embeddings. Check it out at geoembeddings.org!

Some selected recommendations:

use collection formats for storing embeddings, Zarr v3 for regularly gridded data, GeoParquet for embeddings of sparsely collected data
use the new embedding STAC spec which communicate how embeddings were produced, fitness for use, and search and discovery metadata. A sibling convention for storing this same info within a Zarr is in the works.
take a look at an example Model Card that compiles metadata to document the model that produces embeddings: geoembeddings.org/model-card.html
Check out this tutorial for inspecting AEF embeddings with Xarray! Feel free to make your own tutorials and contribute them here. We’d love to collate examples of workflows on top of embeddings for similarity search, change detection, and fine-tuning, as well as showcase other embedding models and products.

Zarr vs COG storage comparison

Sources: Technical Debt of Earth Embedding Products, Isaac Corley; Falk, H. zarr-developers/zarr-illustrations-falk-2022 | Zenodo, 2022.

These resources are a start, but they don’t solve everything. In the rest of this post, I want to highlight three barriers I think we still need to address to make geo embeddings truly impactful: the surprising cost of storing embeddings, unclear fitness for use, and the lack of publicized benchmarks.

Embedding storage can be expensive, take advantage of compression!

The first barrier to using embeddings is practical: storing embeddings can cost more than you might expect. While EO data can be really large, the embeddings generated can be even larger than the input. I didn’t really appreciate this simple fact given my background in detection and segmentation, where typically the model result is smaller than the input.

In recent model releases, EO models are trending towards flexibly accepting, but not requiring, multi modal inputs. If you have only Sentinel-2 imagery, you can generate embeddings for just Sentinel-2. But this wastes some capacity of the embedding generation model.

For example, OlmoEarth Nano generates 128 embedding dimensions regardless of how many sensors or timesteps you feed in. That means adding more sensors makes the embedding progressively smaller relative to the input. The table below compares embedding size to input size for a 512 × 512 block at 10 m resolution with 12 monthly timesteps, in float32 (values show the ratio of embedding size to input size):

Input stack	Bands	Input size	Nano Output (128-dim)	Base Output (768-dim)
S2 only, 1 scene	13	0.013 GiB	9.6× larger	57× larger
S2 only, 12 scenes	13	0.152 GiB	0.82×	4.9× larger
S2 + S1 VV/VH, 12 scenes	15	0.176 GiB	0.71×	4.3× larger
S2 + S1 VV/VH + Landsat, 12 scenes	26	0.305 GiB	0.41×	2.5× larger

The Nano embedding only becomes smaller than the input once you have a full 12-scene time series. The Base model (768-dim, 0.750 GiB) is larger than the raw input in every scenario, so it is worth factoring this storage cost in and weighing it against the performance of the embeddings prior to scaling out and committing resources to a big run.

Fitness for use is often unknown

Even if you can afford to store embeddings, do you know if they’ll work for your problem?

EO Embeddings are often described as products of “foundation models”. Yet unlike foundation LLM models, which can carry out agentic tasks in thousands of different contexts at expert level, EO embedding models are much more restricted in the domains they can be successfully applied.

For example, today most EO embeddings are derived from medium resolution satellite imagery. Practitioners curious about embeddings are often disappointed to find that they are not able to use embeddings with high resolution imagery for detection tasks. Or, they may be limited to agricultural use cases, not be applicable to some geographic regions, etc.

Documenting and describing this fitness for use is often done in the research paper, but not clearly in the catalogue where the embeddings are hosted. And papers often leave out details on the sampling regime and how embeddings were tested. To address this, check out the resources linked above to better document your embedding model or an existing embedding product!

Lack of publicized benchmarks is a barrier to adoption

And we don’t yet have a shared way to answer questions about fitness for use. Model benchmarks, while imperfect and easily gamed, are essential for communicating what is worth the effort to try.

In the LLM space, benchmarks like MMLU and SWE-Bench have been useful signals for indicating which models are best. As they become saturated or too easy, new benchmarks arise to fill the gap.

I don’t see a similar pattern happening with EO embeddings or EO models in general. Benchmarks are typically done within a paper and how comprehensive it is governed by how much time, compute, and expertise the author had. I don’t think we as a community of researchers, practitioners, and end-users have much memory for these paper benchmarks.

I’d like to contribute to an effort toward a leaderboard for EO foundation models, which can host multiple benchmarks, aligning the community on which models, and which benchmark datasets, are useful and important.

To address this, get in touch if you’d like to compare embedding models to each other at useful scale so we can more effectively communicate performance of these models for specific use cases!

What’s Wherobots doing with embeddings and foundation models?

At the core of it, I think experimenting with embeddings can often feel difficult, and we’re trying to change that at Wherobots.

Before the workshop, we’d been hearing from many that they were curious to try foundation models on their specific problem, in order to increase price/performance of their workloads, improve model accuracy, and more easily solve more problems with less complicated workflows and expensive model fine-tuning.

To help here, we’ve onboarded a foundation model to our Model Hub, OlmoEarth Nano for embedding multi-spectral optical and radar imagery.

We publish our models on Huggingface in Pytorch 2 Archive format so that you can load and run these models using only Pytorch as a dependency. This can be helpful for testing and comparing embeddings outside of a larger ML pipeline.

If you need to scale out these models, check out RasterFlow. It’s a batch processing engine built for any scale of imagery you need to run inference on, from city scale, high resolution imagery to global, petabyte scale collections. I think RasterFlow could be a key enabler to comparing embeddings at the large spatial scale they are typically generated at.

If you’re experimenting with EO embeddings, contribute a tutorial to geoembeddings.org, contribute a Model Card for your cool new model, or just share what’s working and what’s not. We’re looking forward to lowering these barriers to EO derived insights with the community!

11 Mins Read 23 Apr 2026

How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline

See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.

Computer Vision + 3

7 Mins Read 28 May 2026

How well does SAM3 detect building footprints? Let’s ask the Wherobots Spatial AI Assistant!

In a recent post, we showed how easy it is to use RasterFlow and Meta’s Segment Anything 3 Model (SAM3) to detect features in the physical world. A single end-to-end pipeline built a 133 GB NAIP mosaic of Marion County, Oregon, ran SAM3 against it with text prompts spanning eight classes, and produced approximately one […]

post

6 Mins Read 26 May 2026

Wherobots MCP Server: Building GEOINT Spatial Pipelines with AI Agents

I built three national-security GEOINT use cases on the Wherobots stack in days instead of weeks. A Critical Infrastructure Vulnerability (CIV) pipeline with two regional variants, plus a border-corridor analysis on real transportation segments. The Wherobots geospatial MCP server is what made that timeline possible. Most of the work in standing up a credible use […]

Spatial Lakehouse + 2