Your AI can now contextualize physical world data using Wherobots Spatial AI Coding Tools Learn More

Take-aways from the 2026 Geospatial Embeddings Workshop at Clark University

Authors

EO Embeddings have a lot of excitement (and hype) around them. My social media is filled with content about EO embeddings and GeoML communities like the TorchGeo Slack and GitHub are focused on onboarding “foundation models” and associated assets like training datasets. Indie hackers like Christopher Ren are building useful applications for exploring and comparing embeddings, see GeoVibes.

This work is all fantastic. At the same time, I’m still left with a sense that fundamental issues around access and enablement have not been addressed, and that the performance and fitness for use of EO embedding products has not been adequately communicated.

In early March, I got together with industry and academic experts at the Geospatial Embeddings Workshop at Clark University to start addressing this. We discussed standards for storage, documentation, and cataloging and implemented some of these ideas, which can be found at github.com/geo-embeddings.

Workshop Outcomes

We now have a site to collect best practices, document standards, and showcase tutorials for EO Embeddings. Check it out at geoembeddings.org!

Some selected recommendations:

  • use collection formats for storing embeddings, Zarr v3 for regularly gridded data, GeoParquet for embeddings of sparsely collected data
  • use the new embedding STAC spec which communicate how embeddings were produced, fitness for use, and search and discovery metadata. A sibling convention for storing this same info within a Zarr is in the works.
  • take a look at an example Model Card that compiles metadata to document the model that produces embeddings: geoembeddings.org/model-card.html
  • Check out this tutorial for inspecting AEF embeddings with Xarray! Feel free to make your own tutorials and contribute them here. We’d love to collate examples of workflows on top of embeddings for similarity search, change detection, and fine-tuning, as well as showcase other embedding models and products.

These resources are a start, but they don’t solve everything. In the rest of this post, I want to highlight three barriers I think we still need to address to make geo embeddings truly impactful: the surprising cost of storing embeddings, unclear fitness for use, and the lack of publicized benchmarks.

Embedding storage can be expensive, take advantage of compression!

The first barrier to using embeddings is practical: storing embeddings can cost more than you might expect. While EO data can be really large, the embeddings generated can be even larger than the input. I didn’t really appreciate this simple fact given my background in detection and segmentation, where typically the model result is smaller than the input.

In recent model releases, EO models are trending towards flexibly accepting, but not requiring, multi modal inputs. If you have only Sentinel-2 imagery, you can generate embeddings for just Sentinel-2. But this wastes some capacity of the embedding generation model.

For example, OlmoEarth Nano generates 128 embedding dimensions regardless of how many sensors or timesteps you feed in. That means adding more sensors makes the embedding progressively smaller relative to the input. The table below compares embedding size to input size for a 512 × 512 block at 10 m resolution with 12 monthly timesteps, in float32 (values show the ratio of embedding size to input size):

Input stack Bands Input size Nano Output (128-dim) Base Output (768-dim)
S2 only, 1 scene 13 0.013 GiB 9.6× larger 57× larger
S2 only, 12 scenes 13 0.152 GiB 0.82× 4.9× larger
S2 + S1 VV/VH, 12 scenes 15 0.176 GiB 0.71× 4.3× larger
S2 + S1 VV/VH + Landsat, 12 scenes 26 0.305 GiB 0.41× 2.5× larger

The Nano embedding only becomes smaller than the input once you have a full 12-scene time series. The Base model (768-dim, 0.750 GiB) is larger than the raw input in every scenario, so it is worth factoring this storage cost in and weighing it against the performance of the embeddings prior to scaling out and committing resources to a big run.

Fitness for use is often unknown

Even if you can afford to store embeddings, do you know if they’ll work for your problem?

EO Embeddings are often described as products of “foundation models”. Yet unlike foundation LLM models, which can carry out agentic tasks in thousands of different contexts at expert level, EO embedding models are much more restricted in the domains they can be successfully applied.

For example, today most EO embeddings are derived from medium resolution satellite imagery. Practitioners curious about embeddings are often disappointed to find that they are not able to use embeddings with high resolution imagery for detection tasks. Or, they may be limited to agricultural use cases, not be applicable to some geographic regions, etc.

Documenting and describing this fitness for use is often done in the research paper, but not clearly in the catalogue where the embeddings are hosted. And papers often leave out details on the sampling regime and how embeddings were tested. To address this, check out the resources linked above to better document your embedding model or an existing embedding product!

Lack of publicized benchmarks is a barrier to adoption

And we don’t yet have a shared way to answer questions about fitness for use. Model benchmarks, while imperfect and easily gamed, are essential for communicating what is worth the effort to try.

In the LLM space, benchmarks like MMLU and SWE-Bench have been useful signals for indicating which models are best. As they become saturated or too easy, new benchmarks arise to fill the gap.

I don’t see a similar pattern happening with EO embeddings or EO models in general. Benchmarks are typically done within a paper and how comprehensive it is governed by how much time, compute, and expertise the author had. I don’t think we as a community of researchers, practitioners, and end-users have much memory for these paper benchmarks.

I’d like to contribute to an effort toward a leaderboard for EO foundation models, which can host multiple benchmarks, aligning the community on which models, and which benchmark datasets, are useful and important.

To address this, get in touch if you’d like to compare embedding models to each other at useful scale so we can more effectively communicate performance of these models for specific use cases!

What’s Wherobots doing with embeddings and foundation models?

At the core of it, I think experimenting with embeddings can often feel difficult, and we’re trying to change that at Wherobots.

Before the workshop, we’d been hearing from many that they were curious to try foundation models on their specific problem, in order to increase price/performance of their workloads, improve model accuracy, and more easily solve more problems with less complicated workflows and expensive model fine-tuning.

To help here, we’ve onboarded a foundation model to our Model Hub, OlmoEarth Nano for embedding multi-spectral optical and radar imagery.

We publish our models on Huggingface in Pytorch 2 Archive format so that you can load and run these models using only Pytorch as a dependency. This can be helpful for testing and comparing embeddings outside of a larger ML pipeline.

If you need to scale out these models, check out RasterFlow. It’s a batch processing engine built for any scale of imagery you need to run inference on, from city scale, high resolution imagery to global, petabyte scale collections. I think RasterFlow could be a key enabler to comparing embeddings at the large spatial scale they are typically generated at.

If you’re experimenting with EO embeddings, contribute a tutorial to geoembeddings.org, contribute a Model Card for your cool new model, or just share what’s working and what’s not. We’re looking forward to lowering these barriers to EO derived insights with the community!