Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
If you’ve ever worked with spatial data, you probably needed to define a geographic boundary within which to conduct your analysis. Most of the time, these are administrative boundaries such as cities, states, provinces, countries etc. For instance, if you want to scope your analysis to New York City, you’d need to look for the admin boundary online, find an authoritative source, figure out how to either download and hardcode the data into your code or build a pipeline that reads directly from their APIs (if they support it). In case of New York’s CSV file, doing this will give you a 14.5K characters long text mostly consisting of unreadable coordinates, a common challenge for developers before tools like the wkls Python library.
nyc = 'MULTIPOLYGON (((-74.046135 40.691125, -74.046176 40.691092,\ -74.047041 40.691041, -74.047149 40.690985, -74.047207 40.690893, \-74.047196 40.690794, -74.047146 40.690714, -74.047026 40.690589, \-74.047041 40.690482, -74.047183 40.690411, -74.046248 40.689319, \[... 149 lines later!]-74.0400963 40.6989342, -74.0401502 40.6989014)))'
nyc = wkls.us.nyc.cityofnewyork.wkt() # New York City
These boundaries are well-known and well-defined, but most geospatial tools do not include them natively. This is because getting geopolitically precise administrative boundaries is challenging and often results in very large datasets (e.g., 10K-1M points per boundary).
As stated above, oftentimes data practitioners are forced to find Shapefiles for these boundaries on the internet, write code to download them from the source and include them in their projects. Alternatively, developers sometimes hardcode these strings into their projects or use inaccurate bounding boxes instead of the actual administrative boundary. This is, at best, boilerplate code that needs to be written and maintained over and over again and a possible source of inconsistencies between projects.
We heard this feedback from our customers repeatedly and it lined up perfectly with our mission to make geospatial easy to work with. That is why we are very excited to introduce the Well Known Locations (wkls) library. The wkls library (pronounced “Whickles”) includes ~625K global administrative boundaries — from countries to cities — which can be referred to by name using clean, chainable Python syntax. The library reads directly from Overture Maps Foundation GeoParquet data hosted on the AWS Open Data Registry. The supported formats are WKT, WKB, HexWKB, GeoJSON, and SVG. The library is included in Wherobots core libraries and there is zero installation or configuration required to take advantage of it.
Start by importing the library into your code and reference the cities using Python’s objection notation:
import wklswkls.us.wkt() # country: United Stateswkls.us.ny.wkt() # state: New Yorkwkls.us.nyc.cityofnewyork.wkt() # city: New Yorkwkls["us"]["ny"]["cityofnewyork"].wkt() # dictionary-style access
wkls supports up to 3 chained attributes:
For instance, the chained expression wkls.us.ca.sanfrancisco returns a data frame object containing all the matches to the administrative boundary for San Francisco. In most cases, the call resolves to a single admin boundary object (i.e., row). If there are name collisions (e.g., two representations of city of San Francisco, one with just the land border and the other including shorelines as well), multiple rows may be returned.
Once you have the administrative boundary object, it can be used like any other geometry within Wherobots. For instance, you can calculate intersections, reference the boundary in any raster function, etc.
For more information please read the wkls documentation.
Want to contribute to the library? You can open issues, submit pull requests, improve documentation and more by following the instructions on this open source repository.
Making administrative boundaries more accessible is not the only way we are making geospatial developers’ lives easier. Our platform runs spatial queries 5-20X faster and up to 60% more cost efficient to use compared to other industry leading solutions. We also have rich functionality that allows you to run vector and raster functions in the same query. Our Spatial AI capabilities are also industry leading. Finally, our Community tier is free to try!
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Scaling Spatial Analysis: How KNN Solves the Spatial Density Problem for Large-Scale Proximity Analysis
How we processed 44 million geometries across 5 US states by solving the spatial density problem that breaks traditional spatial proximity analysis
Wherobots brought modern infrastructure to spatial data in 2025
We’re bridging the gap between AI and data from the physical world in 2026
The Medallion Architecture for Geospatial Data: Why Spatial Intelligence Demands a Different Approach
When most data engineers hear “medallion architecture,” they think of the traditional multi-hop layering pattern that powers countless analytics pipelines. The concept is sound: progressively refine raw data into analytical data and products. But geospatial data breaks conventional data engineering in ways that demand we rethink the entire pipeline. This isn’t about just storing location […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: