Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
EO Embeddings have a lot of excitement (and hype) around them. My social media is filled with content about EO embeddings and GeoML communities like the TorchGeo Slack and GitHub are focused on onboarding “foundation models” and associated assets like training datasets. Indie hackers like Christopher Ren are building useful applications for exploring and comparing embeddings, see GeoVibes.
This work is all fantastic. At the same time, I’m still left with a sense that fundamental issues around access and enablement have not been addressed, and that the performance and fitness for use of EO embedding products has not been adequately communicated.
In early March, I got together with industry and academic experts at the Geospatial Embeddings Workshop at Clark University to start addressing this. We discussed standards for storage, documentation, and cataloging and implemented some of these ideas, which can be found at github.com/geo-embeddings.
We now have a site to collect best practices, document standards, and showcase tutorials for EO Embeddings. Check it out at geoembeddings.org!
Some selected recommendations:
Sources: Technical Debt of Earth Embedding Products, Isaac Corley; Falk, H. zarr-developers/zarr-illustrations-falk-2022 | Zenodo, 2022.
These resources are a start, but they don’t solve everything. In the rest of this post, I want to highlight three barriers I think we still need to address to make geo embeddings truly impactful: the surprising cost of storing embeddings, unclear fitness for use, and the lack of publicized benchmarks.
The first barrier to using embeddings is practical: storing embeddings can cost more than you might expect. While EO data can be really large, the embeddings generated can be even larger than the input. I didn’t really appreciate this simple fact given my background in detection and segmentation, where typically the model result is smaller than the input.
In recent model releases, EO models are trending towards flexibly accepting, but not requiring, multi modal inputs. If you have only Sentinel-2 imagery, you can generate embeddings for just Sentinel-2. But this wastes some capacity of the embedding generation model.
For example, OlmoEarth Nano generates 128 embedding dimensions regardless of how many sensors or timesteps you feed in. That means adding more sensors makes the embedding progressively smaller relative to the input. The table below compares embedding size to input size for a 512 × 512 block at 10 m resolution with 12 monthly timesteps, in float32 (values show the ratio of embedding size to input size):
512 × 512
The Nano embedding only becomes smaller than the input once you have a full 12-scene time series. The Base model (768-dim, 0.750 GiB) is larger than the raw input in every scenario, so it is worth factoring this storage cost in and weighing it against the performance of the embeddings prior to scaling out and committing resources to a big run.
Even if you can afford to store embeddings, do you know if they’ll work for your problem?
EO Embeddings are often described as products of “foundation models”. Yet unlike foundation LLM models, which can carry out agentic tasks in thousands of different contexts at expert level, EO embedding models are much more restricted in the domains they can be successfully applied.
For example, today most EO embeddings are derived from medium resolution satellite imagery. Practitioners curious about embeddings are often disappointed to find that they are not able to use embeddings with high resolution imagery for detection tasks. Or, they may be limited to agricultural use cases, not be applicable to some geographic regions, etc.
Documenting and describing this fitness for use is often done in the research paper, but not clearly in the catalogue where the embeddings are hosted. And papers often leave out details on the sampling regime and how embeddings were tested. To address this, check out the resources linked above to better document your embedding model or an existing embedding product!
And we don’t yet have a shared way to answer questions about fitness for use. Model benchmarks, while imperfect and easily gamed, are essential for communicating what is worth the effort to try.
In the LLM space, benchmarks like MMLU and SWE-Bench have been useful signals for indicating which models are best. As they become saturated or too easy, new benchmarks arise to fill the gap.
I don’t see a similar pattern happening with EO embeddings or EO models in general. Benchmarks are typically done within a paper and how comprehensive it is governed by how much time, compute, and expertise the author had. I don’t think we as a community of researchers, practitioners, and end-users have much memory for these paper benchmarks.
I’d like to contribute to an effort toward a leaderboard for EO foundation models, which can host multiple benchmarks, aligning the community on which models, and which benchmark datasets, are useful and important.
To address this, get in touch if you’d like to compare embedding models to each other at useful scale so we can more effectively communicate performance of these models for specific use cases!
At the core of it, I think experimenting with embeddings can often feel difficult, and we’re trying to change that at Wherobots.
Before the workshop, we’d been hearing from many that they were curious to try foundation models on their specific problem, in order to increase price/performance of their workloads, improve model accuracy, and more easily solve more problems with less complicated workflows and expensive model fine-tuning.
To help here, we’ve onboarded a foundation model to our Model Hub, OlmoEarth Nano for embedding multi-spectral optical and radar imagery.
We publish our models on Huggingface in Pytorch 2 Archive format so that you can load and run these models using only Pytorch as a dependency. This can be helpful for testing and comparing embeddings outside of a larger ML pipeline.
If you need to scale out these models, check out RasterFlow. It’s a batch processing engine built for any scale of imagery you need to run inference on, from city scale, high resolution imagery to global, petabyte scale collections. I think RasterFlow could be a key enabler to comparing embeddings at the large spatial scale they are typically generated at.
If you’re experimenting with EO embeddings, contribute a tutorial to geoembeddings.org, contribute a Model Card for your cool new model, or just share what’s working and what’s not. We’re looking forward to lowering these barriers to EO derived insights with the community!
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Spatial Data Processing Platforms: A Comparison of Enterprise and Cloud-Native Options
For Data Engineers and Architects Evaluating Spatial Workloads on Snowflake, Databricks, and PostGIS Six platforms dominate spatial data processing today: PostGIS for transactional workloads under 100GB, Snowflake and BigQuery GIS for light spatial enrichment inside a broader analytics platform, Databricks for vector spatial joins on the Lakehouse, Apache Sedona for self-managed open-source distributed spatial compute, […]
Spatial Data Pipeline Architecture: PostGIS and Wherobots Together
In the world of data architecture, there is a dangerous myth that you have to choose “one tool to rule them all.” We often see organizations paralyzed by the debate: “Should we use a Database or a Data Lake?” A spatial data pipeline architecture built for both large-scale analytics and operational queries is one of […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: