Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
If you’ve ever worked with spatial data, you probably needed to define a geographic boundary within which to conduct your analysis. Most of the time, these are administrative boundaries such as cities, states, provinces, countries etc. For instance, if you want to scope your analysis to New York City, you’d need to look for the admin boundary online, find an authoritative source, figure out how to either download and hardcode the data into your code or build a pipeline that reads directly from their APIs (if they support it). In case of New York’s CSV file, doing this will give you a 14.5K characters long text mostly consisting of unreadable coordinates, a common challenge for developers before tools like the wkls Python library.
nyc = 'MULTIPOLYGON (((-74.046135 40.691125, -74.046176 40.691092,\ -74.047041 40.691041, -74.047149 40.690985, -74.047207 40.690893, \-74.047196 40.690794, -74.047146 40.690714, -74.047026 40.690589, \-74.047041 40.690482, -74.047183 40.690411, -74.046248 40.689319, \[... 149 lines later!]-74.0400963 40.6989342, -74.0401502 40.6989014)))'
nyc = wkls.us.nyc.cityofnewyork.wkt() # New York City
These boundaries are well-known and well-defined, but most geospatial tools do not include them natively. This is because getting geopolitically precise administrative boundaries is challenging and often results in very large datasets (e.g., 10K-1M points per boundary).
As stated above, oftentimes data practitioners are forced to find Shapefiles for these boundaries on the internet, write code to download them from the source and include them in their projects. Alternatively, developers sometimes hardcode these strings into their projects or use inaccurate bounding boxes instead of the actual administrative boundary. This is, at best, boilerplate code that needs to be written and maintained over and over again and a possible source of inconsistencies between projects.
We heard this feedback from our customers repeatedly and it lined up perfectly with our mission to make geospatial easy to work with. That is why we are very excited to introduce the Well Known Locations (wkls) library. The wkls library (pronounced “Whickles”) includes ~625K global administrative boundaries — from countries to cities — which can be referred to by name using clean, chainable Python syntax. The library reads directly from Overture Maps Foundation GeoParquet data hosted on the AWS Open Data Registry. The supported formats are WKT, WKB, HexWKB, GeoJSON, and SVG. The library is included in Wherobots core libraries and there is zero installation or configuration required to take advantage of it.
Start by importing the library into your code and reference the cities using Python’s objection notation:
import wklswkls.us.wkt() # country: United Stateswkls.us.ny.wkt() # state: New Yorkwkls.us.nyc.cityofnewyork.wkt() # city: New Yorkwkls["us"]["ny"]["cityofnewyork"].wkt() # dictionary-style access
wkls supports up to 3 chained attributes:
For instance, the chained expression wkls.us.ca.sanfrancisco returns a data frame object containing all the matches to the administrative boundary for San Francisco. In most cases, the call resolves to a single admin boundary object (i.e., row). If there are name collisions (e.g., two representations of city of San Francisco, one with just the land border and the other including shorelines as well), multiple rows may be returned.
Once you have the administrative boundary object, it can be used like any other geometry within Wherobots. For instance, you can calculate intersections, reference the boundary in any raster function, etc.
For more information please read the wkls documentation.
Want to contribute to the library? You can open issues, submit pull requests, improve documentation and more by following the instructions on this open source repository.
Making administrative boundaries more accessible is not the only way we are making geospatial developers’ lives easier. Our platform runs spatial queries 5-20X faster and up to 60% more cost efficient to use compared to other industry leading solutions. We also have rich functionality that allows you to run vector and raster functions in the same query. Our Spatial AI capabilities are also industry leading. Finally, our Community tier is free to try!
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
Mobility Data Processing at Scale: Why Traditional Spatial Systems Break Down
A Wherobots Solution Accelerator for GPS Mobility Analytics — Part 1 of 2
PostGIS vs Wherobots: What It Actually Costs You to Choose Wrong
When building a geospatial platform, technical decisions are never just technical, they are financial. Choosing the wrong architecture for your spatial data doesn’t just frustrate your data team; it directly impacts your bottom line through large cloud infrastructure bills and, perhaps more dangerously, delayed business insights. For decision-makers, the choice between a traditional spatial database […]
Streaming Spatial Data into Wherobots with Spark Structured Streaming
Real-time Spatial Pipelines Shouldn’t Be This Hard (But They Were) I’ve been doing geospatial work for over twenty years now. I’ve hand-rolled ETL pipelines, babysat cron jobs, and debugged more coordinate system mismatches than a person should reasonably endure in one lifetime. So when someone says “streaming spatial data,” my first reaction used to be […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: