Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
Generating map tiles can be challenging and expensive, especially when dealing with large datasets like those from the Overture Maps Foundation (Overture). It’s challenging and expensive because typical solutions are not scalable or performant, which forces you to build workarounds that waste your time and are not economical. Wherobots addresses these challenges using WherobotsDB, a purpose-built, planetary scale compute engine with a native vector tile generator (VTiles) that produces tiles in an easy to work with, cloud-native file format (PMTiles).
In this post, we’ll show you how Wherobots makes generating vector tiles from billions of features at a large scale, a breeze! We’ll prove this to you using a demo that generates a tileset for all 11 Overture layers in New York City, and we’ll repeat this at a larger scale for all the transportation segments across the state of New York.
Vector tiles are small chunks of map data that allow for efficient rendering at varying zoom levels. Unlike raster tiles which are pre-rendered images, vector tiles contain attributes and geometric data that facilitate dynamic styling of map features on the fly, offering more flexibility and interactivity.
PMTiles is a cloud-native file format that is designed for holding an entire collection of tiles, in this case vector tiles. The PMTiles format allows individual tiles to be queried directly from cloud object storage like Amazon S3. By querying directly from cloud storage, you no longer need to set up and manage dedicated infrastructure, reducing your costs and time-to-tile-generation.
Generating tiles from worldwide map datasets was always a challenge. You had to process billions of geometric features using solutions that are not purpose-built for this scale, resulting in
Wherobots VTiles, our new native vector tile generator, incorporates innovative algorithms for distributed tile generation on WherobotDB, a high performance spatial compute engine. VTiles is designed to generate vector tiles from small to planetary scale datasets quickly and cost-efficiently. Wherobots handles the heavy lifting and infrastructure management, ensuring the tile generation process is performant, scalable, and easy. We will prove this in the following demos.
We’ll use Wherobots VTiles to generate PMTiles for all Overture layers in New York City. Then we will scale the demo up by generating PMTiles for all transportation segments in New York State, using feature filter optimizations.
First, we need to load the administrative, places, transportation, base, and buildings layers (yes, all of them!) from the Overture data for New York City. The latest Overture dataset is included in the Wherobots Spatial Catalog out-of-the-box, which makes it easy to load all of these layers in a single statement.
_aoi = "POLYGON ((-73.957901 40.885486, …, -73.957901 40.885486))" df_transportation_segment = sedona.sql(f""" SELECT ST_INTERSECTION(ST_GEOMFROMWKT("{_aoi}"),geometry), "transportation_segment" as layer FROM wherobots_open_data.overture_2024_02_15.transportation_segment t1 WHERE ST_INTERSECTS(ST_GeomFromText("{_aoi}"),geometry) """)
Here we apply a spatial filter (WHERE ST_INTERSECTS()) and a “clipping” function (ST_INTERSECTION()) to reduce our data to our area of interest, New York City. We also add a layer attribute containing our layer name. You can add or bring in additional attributes if needed. This “load-intersect” statement is executed for each of the 11 Overture Maps Foundation layers in Wherobots and the resulting DataFrames are added to a list for the next step in data prep.
WHERE ST_INTERSECTS()
ST_INTERSECTION()
We want to cut a single PMTile file for all the layers in one go, and to accomplish this all the layers need to be unioned together. Here we iterate through the list of layers and perform the union with the DataFrame API.
nyc_unioned_data = df_admins_administrativeBoundary for t in tables_4_tiles[1:]: nyc_unioned_data=nyc_unioned_data.union(t.where(ST_Intersects(t.geometry ,\ ST_GeomFromText(lit(_aoi)))).select \ ("layer",ST_Intersection(ST_GeomFromText(lit(_aoi)), t.geometry).alias("geometry"))) nyc_unioned_data = nyc_unioned_data.where(f'ST_IsEmpty(geometry) = False')
With our data prepared we are ready to kick off the tile generation process. This process looks like this:
# Generate the tiles nyc_tiles_df = vtiles.generate(nyc_unioned_data) #Define storage endpoint nyc_full_tiles_path = os.getenv("USER_S3_PATH") + "nyc_tiles.pmtiles" #Define a vtile builder specifying the data for automatic schema discovery builder = vtiles.PMTilesConfigBuilder().from_features_data_frame(nyc_unioned_data)**** #Define the order in which to layer the tiles nyc_ordered_layers= [ buildings, places ..., base] builder.layers = nyc_ordered_layers #BUILD ALL THE TILES ordered_config= builder.build() #Write the tiles to storage vtiles.write_pmtiles(nyc_tiles_df, nyc_full_tiles_path, ordered_config)
This process took 79 seconds and generated a 74.7 MiB file. You can visualize the results in Wherobots by calling vtiles.show_pmtiles(nyc_full_tiles_path)
vtiles.show_pmtiles(nyc_full_tiles_path)
Wherobots VTiles breaks the dataset down into manageable chunks, processing each layer efficiently. The result is a single PMTile dataset that can be easily visualized and styled and served directly from S3.
Next, we scale the process up to cover all transportation segments in New York State. This involves a larger dataset, but WherobotsDB handles it seamlessly. The process is identical as above but this time we want to utilize zoom based feature filtering (do we really need to see driveways when viewing at the state scale?) in the vtiles.GenerationConfig() .
vtiles.GenerationConfig()
gen_config = vtiles.GenerationConfig( # Minimum zoom level for generation min_zoom=4, # Maximum zoom level for generation max_zoom=16, feature_filter = ( when( # Only add motorway and trunk features to tiles level 8 and above (col("class").isin(["motorway", "trunk"])) & (col("tile.z") < 8), False) .when(…) … ).otherwise(True) # Default to rendering a feature ) )
We pass our configuration into VTiles: df_transportation_segment_tiles = vtiles.generate(df_transportation_segment,gen_config) and use the same vtiles.write_pmtiles() function as above. PMTile generation for NY state transportation segments completed in ~2.5 minutes.
df_transportation_segment_tiles = vtiles.generate(df_transportation_segment,gen_config)
vtiles.write_pmtiles()
Previously, generating vector tiles from large datasets, like those from the Overture Maps Foundation, was challenging. You now have a solution to these challenges. Wherobots makes tile generation at any scale, easy, reliable, and cost effective. We simplify the process by unlocking distributed tile generation, and we make it possible to visualize and share large-scale geospatial data efficiently. Whether you’re focusing on a single city like New York City, an entire state, or the entire planet, Wherobots provides tile generation capability that’s purpose-built for any scale.
Ready to dive into vector tile generation with Wherobots? Start by signing up for a free Wherobots account, walk through this tutorial, connect Wherobots to your data in S3, and enjoy the benefits of efficient and flexible map visualization. Check out the WherobotsDB VTiles tutorial and reference documentation for more information
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
PostGIS vs Wherobots: What It Actually Costs You to Choose Wrong
When building a geospatial platform, technical decisions are never just technical, they are financial. Choosing the wrong architecture for your spatial data doesn’t just frustrate your data team; it directly impacts your bottom line through large cloud infrastructure bills and, perhaps more dangerously, delayed business insights. For decision-makers, the choice between a traditional spatial database […]
Streaming Spatial Data into Wherobots with Spark Structured Streaming
Real-time Spatial Pipelines Shouldn’t Be This Hard (But They Were) I’ve been doing geospatial work for over twenty years now. I’ve hand-rolled ETL pipelines, babysat cron jobs, and debugged more coordinate system mismatches than a person should reasonably endure in one lifetime. So when someone says “streaming spatial data,” my first reaction used to be […]
WherobotsDB is 3x faster with up to 45% better price performance
The next generation of WherobotsDB, the Apache Sedona and Spark 4 compatible engine, is now generally available.
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: