Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Geospatial data touches nearly every industry, and until recently, the open lakehouse had no native way to handle it.
Snowflake recently announced Iceberg v3 support with native geometry and geography types. It’s the first major engine to ship the geospatial extensions to the Iceberg spec. These types are now part of the open standard, available to every engine in the ecosystem.
geometry
geography
In their Iceberg v3 post, the Snowflake team called out where the geo work came from:
“A special mention to the entire Wherobots team, which implemented geospatial support on its own fork of Iceberg before offering its expertise to the Iceberg community, providing leadership and implementing the feature for the Iceberg project.”
The work started in 2022, the year Wherobots was founded. This post is the story behind it: how production experience became an open standard, and what it means for the teams and tools building on spatial data.
Until Iceberg v3, geospatial columns did not exist as a concept in the table format. Engineers stored geometry as opaque binary blobs: Well-Known Binary (WKB) bytes in a binary column. The Iceberg catalog had no way to know the column contained spatial data.
binary
The practical consequences:
It worked, but it was fragile. And it did not travel across engines.
Havasu was our answer to that problem: a spatial lakehouse extension built on an Iceberg fork. Not a proof of concept. A production system, running real customer workloads since 2022, where we could pressure-test the design decisions that would eventually become a standard:
That production experience shaped our GeoLake research, which formalized the core design principles: unambiguous CRS representation, efficient geometry encoding in columnar storage, and bounding-box statistics that let spatial predicates be evaluated before any geometry data is read.
Then we contributed the design, the implementation experience, and the lessons learned upstream to Apache Parquet and Apache Iceberg, so the broader ecosystem could build on it.
Many of the design decisions validated in Havasu are now part of the Iceberg v3 spec. Bounding-box statistics, CRS propagation, and geometry encoding all made the transition from production system to open standard.
To understand why this matters, consider the difference concretely.
Before (Iceberg v1/v2): A geometry column in the Iceberg schema looks like this:
{ "id": 5, "name": "geom", "type": "binary" }
The catalog sees raw bytes. No spatial semantics. An engine reading this table has no way to know this column contains geometries, what CRS they’re in, or how to prune files spatially, unless it relies on out-of-band conventions like GeoParquet file-level metadata.
After (Iceberg v3): The same column becomes:
{ "id": 5, "name": "geom", "type": "geometry(srid:4326)" }
Now the CRS is a property of the type. The Parquet files carry GEOMETRY logical type annotations that any conforming engine can recognize. Bounding-box statistics in the manifest use a compact coordinate encoding, so engines can skip entire file groups that fall outside a query’s spatial bounds, before reading a single geometry value.
GEOMETRY
Spatial pruning moves to the format level. This is the architecture we designed and validated in Havasu, now available to any engine that reads Iceberg v3.
Making geospatial a first-class type in the open lakehouse required coordinated changes to two foundational Apache projects.
On the Parquet side, the PR to add GEOMETRY and GEOGRAPHY logical types to the format spec drew over 400 comments across months of design review: encoding formats, CRS semantics, edge-interpolation behavior, edge-case handling. Jia Yu (Wherobots Co-Founder and Apache Sedona PMC Chair) and Kristin Cowalcijk (Apache Sedona PMC member) drove core design decisions from the start.
On the Iceberg side, the geo type spec defined how Iceberg catalogs spatial columns, stores bounding-box metadata, and handles spatial partitioning, with another 240+ comments of cross-community design work. The core API and implementation, authored by Kristin, followed with bounding-box types, geospatial predicates, and Parquet geo read/write.
Over a year of coordinated work across both Apache communities. The result: geometry and geography as native primitive types, from the storage format through the table format – with the same level of spec support that timestamps, decimals, and every other type have always had.
Snowflake’s announcement is a milestone, not the finish line. It’s the first major engine to ship v3 geo types. It won’t be the last.
Iceberg v3 adoption is accelerating broadly. AWS Glue shipped v3 support at re:Invent 2025. Dremio followed with GA support in their cloud platform. As more engines adopt v3, spatial columns will interoperate the same way every other column type already does. The infrastructure barrier that kept geospatial data siloed from mainstream analytics is coming down.
The end-to-end Parquet geo read/write path for Iceberg is under active review in the Apache Iceberg project. It connects v3 schema types to properly annotated Parquet files with GEOMETRY logical types. Once merged, any Iceberg-compatible engine will produce and consume fully v3-native geo tables.
Apache Iceberg is increasingly how organizations query data across engines, and native geo types mean spatial data now travels the same way every other column type does.
A geo-typed table written from one engine is readable by another with correct CRS, bounding-box stats, and spatial pruning intact. No proprietary connectors. No out-of-band conventions.
Cross-engine portability also makes spatial data accessible to AI. Tools and agents working with physical-world data need consistent access across engines and catalogs. When the type contract lives in the format, access does not require per-system special-casing.
Wherobots Cloud is where teams run spatial ETL, large-scale analytics, and geospatial data engineering on Iceberg today, fully managed. As the upstream v3 pipeline completes, Wherobots Cloud will be the first to produce fully v3-native geo tables. The natural result of being the team that designed the spec and has the longest production track record on it.
Apache Sedona reads and writes Iceberg tables with geospatial columns today, using the Havasu encoding that preceded the v3 spec. This means Sedona users have had production-grade spatial lakehouse capabilities since before the standard was finalized, and the migration path to v3-native types is straightforward as upstream support matures.
WherobotsDB and Sedona also go beyond what the spec requires: distributed CRS-aware computation, automatic transformation across datasets in different projections, and support for CRS formats beyond SRID integers (WKT, PROJ strings, grid-based datum shifts). Most engines that adopt v3 will read the CRS tag. WherobotsDB and Sedona compute with it at scale.
The end-to-end Parquet geo read/write path for Iceberg is under active review in the Apache Iceberg project. It’s the final piece connecting v3 schema types to properly annotated Parquet files with GEOMETRY logical types. Once merged, any Iceberg-compatible engine will be able to produce and consume fully v3-native geo tables.
The v3 geo types are the foundation. The next layer is making the full pipeline seamless: spatial partitioning strategies, advanced spatial indexing beyond bounding boxes, and tighter integration between spatial predicates and query planning across engines.
These are problems we’ve been working on in Havasu for years. Now that the standard exists, that work can happen in the open, and the entire ecosystem benefits.
The Spatial AI Coding Assistant, including the Wherobots MCP Server, VS Code Extension, and CLI, lets developers and AI agents work with spatial data through natural language. A developer describes an analysis problem, and the tools find relevant datasets, generate spatial queries, and execute them on Wherobots Cloud. Native geo types in Iceberg make this work cleanly: when the type, CRS, and spatial statistics live in the format, an agent queries physical-world data the same way it queries any other column, across any engine reading Iceberg.
If you’re building your spatial data architecture on Iceberg, we’d like to talk.
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Take-aways from the 2026 Geospatial Embeddings Workshop at Clark University
Some brief take-aways from a workshop to set standards for storing and sharing geospatial embeddings.
Introducing developer tools that let AI build with physical world data
Your AI can now understand and query spatial data using the Wherobots MCP server, VS Code extension, and CLI.
How Agricultural Fields Change in AlphaEarth Foundations
Explore how Alpha Earth Embeddings reveal crop cycles and field change over time using RGB views, PCA, and embedding-distance analysis.
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: