Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Generating map tiles can be challenging and expensive, especially when dealing with large datasets like those from the Overture Maps Foundation (Overture). It’s challenging and expensive because typical solutions are not scalable or performant, which forces you to build workarounds that waste your time and are not economical. Wherobots addresses these challenges using WherobotsDB, a purpose-built, planetary scale compute engine with a native vector tile generator (VTiles) that produces tiles in an easy to work with, cloud-native file format (PMTiles).
In this post, we’ll show you how Wherobots makes generating vector tiles from billions of features at a large scale, a breeze! We’ll prove this to you using a demo that generates a tileset for all 11 Overture layers in New York City, and we’ll repeat this at a larger scale for all the transportation segments across the state of New York.
Vector tiles are small chunks of map data that allow for efficient rendering at varying zoom levels. Unlike raster tiles which are pre-rendered images, vector tiles contain attributes and geometric data that facilitate dynamic styling of map features on the fly, offering more flexibility and interactivity.
PMTiles is a cloud-native file format that is designed for holding an entire collection of tiles, in this case vector tiles. The PMTiles format allows individual tiles to be queried directly from cloud object storage like Amazon S3. By querying directly from cloud storage, you no longer need to set up and manage dedicated infrastructure, reducing your costs and time-to-tile-generation.
Generating tiles from worldwide map datasets was always a challenge. You had to process billions of geometric features using solutions that are not purpose-built for this scale, resulting in
Wherobots VTiles, our new native vector tile generator, incorporates innovative algorithms for distributed tile generation on WherobotDB, a high performance spatial compute engine. VTiles is designed to generate vector tiles from small to planetary scale datasets quickly and cost-efficiently. Wherobots handles the heavy lifting and infrastructure management, ensuring the tile generation process is performant, scalable, and easy. We will prove this in the following demos.
We’ll use Wherobots VTiles to generate PMTiles for all Overture layers in New York City. Then we will scale the demo up by generating PMTiles for all transportation segments in New York State, using feature filter optimizations.
First, we need to load the administrative, places, transportation, base, and buildings layers (yes, all of them!) from the Overture data for New York City. The latest Overture dataset is included in the Wherobots Spatial Catalog out-of-the-box, which makes it easy to load all of these layers in a single statement.
_aoi = "POLYGON ((-73.957901 40.885486, …, -73.957901 40.885486))" df_transportation_segment = sedona.sql(f""" SELECT ST_INTERSECTION(ST_GEOMFROMWKT("{_aoi}"),geometry), "transportation_segment" as layer FROM wherobots_open_data.overture_2024_02_15.transportation_segment t1 WHERE ST_INTERSECTS(ST_GeomFromText("{_aoi}"),geometry) """)
Here we apply a spatial filter (WHERE ST_INTERSECTS()) and a “clipping” function (ST_INTERSECTION()) to reduce our data to our area of interest, New York City. We also add a layer attribute containing our layer name. You can add or bring in additional attributes if needed. This “load-intersect” statement is executed for each of the 11 Overture Maps Foundation layers in Wherobots and the resulting DataFrames are added to a list for the next step in data prep.
WHERE ST_INTERSECTS()
ST_INTERSECTION()
We want to cut a single PMTile file for all the layers in one go, and to accomplish this all the layers need to be unioned together. Here we iterate through the list of layers and perform the union with the DataFrame API.
nyc_unioned_data = df_admins_administrativeBoundary for t in tables_4_tiles[1:]: nyc_unioned_data=nyc_unioned_data.union(t.where(ST_Intersects(t.geometry ,\ ST_GeomFromText(lit(_aoi)))).select \ ("layer",ST_Intersection(ST_GeomFromText(lit(_aoi)), t.geometry).alias("geometry"))) nyc_unioned_data = nyc_unioned_data.where(f'ST_IsEmpty(geometry) = False')
With our data prepared we are ready to kick off the tile generation process. This process looks like this:
# Generate the tiles nyc_tiles_df = vtiles.generate(nyc_unioned_data) #Define storage endpoint nyc_full_tiles_path = os.getenv("USER_S3_PATH") + "nyc_tiles.pmtiles" #Define a vtile builder specifying the data for automatic schema discovery builder = vtiles.PMTilesConfigBuilder().from_features_data_frame(nyc_unioned_data)**** #Define the order in which to layer the tiles nyc_ordered_layers= [ buildings, places ..., base] builder.layers = nyc_ordered_layers #BUILD ALL THE TILES ordered_config= builder.build() #Write the tiles to storage vtiles.write_pmtiles(nyc_tiles_df, nyc_full_tiles_path, ordered_config)
This process took 79 seconds and generated a 74.7 MiB file. You can visualize the results in Wherobots by calling vtiles.show_pmtiles(nyc_full_tiles_path)
vtiles.show_pmtiles(nyc_full_tiles_path)
Wherobots VTiles breaks the dataset down into manageable chunks, processing each layer efficiently. The result is a single PMTile dataset that can be easily visualized and styled and served directly from S3.
Next, we scale the process up to cover all transportation segments in New York State. This involves a larger dataset, but WherobotsDB handles it seamlessly. The process is identical as above but this time we want to utilize zoom based feature filtering (do we really need to see driveways when viewing at the state scale?) in the vtiles.GenerationConfig() .
vtiles.GenerationConfig()
gen_config = vtiles.GenerationConfig( # Minimum zoom level for generation min_zoom=4, # Maximum zoom level for generation max_zoom=16, feature_filter = ( when( # Only add motorway and trunk features to tiles level 8 and above (col("class").isin(["motorway", "trunk"])) & (col("tile.z") < 8), False) .when(…) … ).otherwise(True) # Default to rendering a feature ) )
We pass our configuration into VTiles: df_transportation_segment_tiles = vtiles.generate(df_transportation_segment,gen_config) and use the same vtiles.write_pmtiles() function as above. PMTile generation for NY state transportation segments completed in ~2.5 minutes.
df_transportation_segment_tiles = vtiles.generate(df_transportation_segment,gen_config)
vtiles.write_pmtiles()
Previously, generating vector tiles from large datasets, like those from the Overture Maps Foundation, was challenging. You now have a solution to these challenges. Wherobots makes tile generation at any scale, easy, reliable, and cost effective. We simplify the process by unlocking distributed tile generation, and we make it possible to visualize and share large-scale geospatial data efficiently. Whether you’re focusing on a single city like New York City, an entire state, or the entire planet, Wherobots provides tile generation capability that’s purpose-built for any scale.
Ready to dive into vector tile generation with Wherobots? Start by signing up for a free Wherobots account, walk through this tutorial, connect Wherobots to your data in S3, and enjoy the benefits of efficient and flexible map visualization. Check out the WherobotsDB VTiles tutorial and reference documentation for more information
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Iceberg v3 Gets Native Geo Types. It’s More Than a Format Upgrade
Introduction Geospatial data touches nearly every industry, and until recently, the open lakehouse had no native way to handle it. Snowflake recently announced Iceberg v3 support with native geometry and geography types. It’s the first major engine to ship the geospatial extensions to the Iceberg spec. These types are now part of the open standard, […]
Take-aways from the 2026 Geospatial Embeddings Workshop at Clark University
Some brief take-aways from a workshop to set standards for storing and sharing geospatial embeddings.
Introducing developer tools that let AI build with physical world data
Your AI can now understand and query spatial data using the Wherobots MCP server, VS Code extension, and CLI.
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: