Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
WherobotsDB VTiles generates global PMTiles from Overture Maps data in 26 minutes by processing billions of features with distributed computation. The system handles three datasets: Places (50M features, 2m 23s), Buildings (2.3B features, 26 minutes), and Division Areas (1M features, optimized to 10 minutes with feature filters).
Key capabilities:
This article demonstrates how to use WherobotsDB VTiles to generate vector tiles for three planetary-scale Overture Maps Foundation datasets in PMTiles format: Places, Buildings, and Division Areas, with feature filtering and distributed computation across billions of features.
Vector tiles and PMTiles together make it possible to store, render, and serve large-scale maps efficiently in the cloud. Vector tiles are small chunks of map data that allow for efficient and customizable map rendering at varying zoom levels. They contain geometric and attribute data, for example roads and their names, that facilitate dynamic styling of map features on the fly, offering more flexibility and interactivity.
PMTiles is a cloud-native file format that is designed for holding an entire collection of tiles, in this case vector tiles. The PMTiles format enables individual tiles to be queried directly from cloud object storage like Amazon S3. By querying directly from cloud storage, you no longer need to set up and manage dedicated infrastructure, reducing your costs, effort, and time-to-tile-generation.
Key terms:
If you’re sharing, inspecting, or debugging tiles you’ll need to visualize them. To make these processes easier, Wherobots created a tile viewer site, available at tile-viewer.wherobots.com. This tool comes from the PMTiles github repository, and it has offers the following features:
This viewer takes a url for a tileset. If your tiles are stored in a private S3 bucket you will need to generate a signed URL. Wherobots Cloud has a function for converting your S3 URI to a signed url:
from wherobots.tools.utility.s3_utils import get_signed_url get_signed_url(my_s3_path, expiration_in_seconds)
my_s3_path will be an s3 uri, like s3://myBucket/my/prefix/to/tiles.pmtiles and expiration_in_seconds will be an int representing the number of seconds the signed url will be valid for.
my_s3_path
s3://myBucket/my/prefix/to/tiles.pmtiles
expiration_in_seconds
The tile viewer will be used to explore the tiles we generate in our examples.
The following examples show tile generation using VTiles for three Overture layers at a planetary scale. Because we are working with planetary scale datasets and want quick results, we will use the large runtimes available in the professional tier of Wherobots Cloud.
Tile generation time is provided in each example, and includes time to load the input data, transform it, generate tiles, and save the PMTiles file in an S3 bucket. It does not include the time to start the cluster.
To run the examples below, just make sure your sedona session is started:
from sedona.spark import SedonaContext config = SedonaContext.builder().getOrCreate() sedona = SedonaContext.create(config)
We start by creating PMTiles for the places dataset. With VTiles, this is a straightforward case for several reasons:
import pyspark.sql.functions as f import os from wherobots.vtiles import GenerationConfig, generate_pmtiles generate_pmtiles( sedona.table("wherobots_open_data.overture_2024_05_16.places_place").select( "geometry", f.col("names.primary").alias("name"), f.col("categories.main").alias("category"), f.lit('places').alias('layer'), ), os.getenv("USER_S3_PATH") + "tile_blog/places.pmtiles", GenerationConfig(6, 15) )
This example generates a PMTiles file for zooms 6 through 15. Since the places dataset contains features that are not relevant at a global level, we selected a minimum zoom of 6, about the size of a large European country. The max zoom of 15 is selected because the precision provided should be sufficient and overzooming means that our places will still render at higher zooms. The OpenStreetMap wiki has a helpful page about how large a tile is at each zoom level. The name and category of each place will be included in the tiles.
Performance Results:
This example generates tiles for all buildings in the Overture building dataset. This is about 2.3 billion features. The roughly uniform size of the features and the relatively small size of buildings relative to the higher zoom tiles means that the number of (feature, tile) combinations is similar to |features| * |zooms|. Because of this homogeneity, we can expect a quick execution without the use of a feature filter. This example represents a typical use case where there is a very large number of features and where the extent of a tile at maximum zoom is larger than the size of a feature.
|features| * |zooms|
import pyspark.sql.functions as f import os from wherobots.vtiles import GenerationConfig, generate_pmtiles generate_pmtiles( sedona.table("wherobots_open_data.overture_2024_05_16.buildings_building").select( "geometry", f.lit('buildings').alias('layer'), ), os.getenv("USER_S3_PATH") + "tile_blog/buildings.pmtiles", GenerationConfig(10, 15) )
This example generates a PMTiles file for zooms 10 through 15. The minimum zoom of 10 was selected because buildings aren’t useful at lower zooms for most use cases. The max zoom of 15 was selected because the precision provided should be sufficient and overzooming means that our buildings will still render at higher zooms. The properties of a very large percentage of the Overture buildings are null so we haven’t included any here.
The third example creates tiles for all polygons and multipolygons in the Overture division areas dataset. This dataset is just under one million records. Despite its small size, this dataset can be challenging to process. It contains polygons and multipolygons representing areas, from countries which are large and highly detailed, to small neighborhoods with minimal detail. The appropriate min/max zoom for countries and neighborhoods is very different.
Recall from the places example that the amount of work the system must do is strongly related to the number of (feature, tile) pairs. A country outline like Canada might cover an entire tile at zoom 5. It will be in roughly 2 * 4^(max_zoom - 5) tiles across all zooms; if max_zoom is 15, that’s over 2 million tiles. You can quickly wind up with an unexpectedly large execution time and tiles archive if you do not take this into account. Most use cases will benefit from setting different min and max zooms for different features, which you can do in VTiles via a feature filter.
2 * 4^(max_zoom - 5)
Let’s first profile the base case with no feature filter.
import pyspark.sql.functions as f import os from wherobots.vtiles import GenerationConfig, generate_pmtiles generate_pmtiles( sedona.table("wherobots_open_data.overture_2024_05_16.divisions_division_area").select( "geometry", f.col("names.primary").alias("name"), f.col("subtype").alias('layer'), ), os.getenv("USER_S3_PATH") + "tile_blog/division_area.pmtiles", GenerationConfig(3, 15) )
Performance Results (Without Optimization):
We can significantly accelerate the execution time of this example using the VTiles feature filters. These feature filters are most commonly used to determine what features should be in a tile on the basis of a category and the zoom level. In this case we will only show countries at lower zooms and neighborhoods at the highest zoom levels. The visual impact of a feature that is much larger than the tile is minimal in typical use cases. The visual impact of a neighborhood is null when it’s smaller than the tile can resolve; it is literally invisible, or perhaps a single pixel. By excluding these features that add no visual information, we save processing time and storage costs, as well as increase the performance of serving the now-smaller tiles.
Here is an example of using feature filters to improve performance of this division area generation task:
import pyspark.sql.functions as f import os from wherobots.vtiles import GenerationConfig, generate_pmtiles generate_pmtiles( sedona.table("wherobots_open_data.overture_2024_05_16.divisions_division_area").select( "geometry", f.col("names.primary").alias("name"), f.col("subtype").alias('layer'), ), os.getenv("USER_S3_PATH") + "tile_blog/division_area_filtered.pmtiles", GenerationConfig( min_zoom=2, max_zoom=15, feature_filter = ( ((f.col("subType") == f.lit("country")) & (f.col("tile.z") < f.lit(7))) | ((f.col("subType") == f.lit("region")) & (f.lit(3) < f.col("tile.z")) & (f.col("tile.z") < f.lit(10))) | ((f.col("subType") == f.lit("county")) & (f.lit(9) < f.col("tile.z")) & (f.col("tile.z") < f.lit(12))) | ((f.col("subType") == f.lit("locality")) & (f.lit(10) < f.col("tile.z")) & (f.col("tile.z") < f.lit(14))) | ((f.col("subType") == f.lit("localadmin")) & (f.lit(13) < f.col("tile.z"))) | ((f.col("subType") == f.lit("neighborhood")) & (f.lit(13) < f.col("tile.z"))) ) ) )
We know that there are use cases with large geometries where it might be difficult to write an effective feature filter or it may be undesirable to filter. For those use cases we have launched a feature in Wherobots 1.3.1 to improve tile generation performance. This will be an option on the GenerationConfig called repartition_frequency. When features are repeatedly split as the algorithm zooms in, those child features wind up in the same partition. This can cause well partitioned input datasets to become skewed by even just a single large record. Setting a repartition frequency to 2 or 4 can help to keep utilization of the cluster high by keeping partitions of roughly uniform size.
WherobotsDB VTiles is a fast and cost-effective solution for generating tiles from global datasets. The Apache Spark-based runtime powered by Apache Sedona enables straightforward data loading and transformation, delivering proven performance at planetary scale:
Performance metrics:
Best practices for optimization:
Feature filters let you curate tile contents to match your specific use cases and performance goals. Try VTiles with your own data on Wherobots Cloud.
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Introducing Scalability for GeoPandas in Apache Sedona
Learn about the new GeoPandas API for Apache Sedona, now available in Wherobots. This new API allows GeoPandas developers to seamlessly scale their analysis beyond what a single compute instance can provide, unlocking insights from large-scale datasets. This integration combines the Pythonic GeoPandas API with the distributed processing power of Apache Sedona.
Wherobots brought modern infrastructure to spatial data in 2025
We’re bridging the gap between AI and data from the physical world in 2026
The Medallion Architecture for Geospatial Data: Why Spatial Intelligence Demands a Different Approach
When most data engineers hear “medallion architecture,” they think of the traditional multi-hop layering pattern that powers countless analytics pipelines. The concept is sound: progressively refine raw data into analytical data and products. But geospatial data breaks conventional data engineering in ways that demand we rethink the entire pipeline. This isn’t about just storing location […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: