Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
How long is too long to wait for a data set to process? In the fast-paced world of data as a service, efficiency isn’t just a nice-to-have; it’s essential. But for GeoPostcodes, before Wherobots, previous updates to the global population movement datasets took about 39 days. This was running on a headless QGIS instance in combination with PostGIS, combining population rasters with postal code boundaries.
For GeoPostcodes, before Wherobots, previous updates to the global population movement datasets took about 39 days.
At Wherobots, we’ve made it our mission to push the boundaries of what’s possible in distributed spatial computing to not just process data faster, but to increase productivity. When GeoPostcodes, a leader in geospatial data products, sought to enhance their administrative boundaries enrichment processes with population estimates derived from the Global Human Settlement (GHS) population grid, they knew they needed a new solution. Together, we’ve managed to transform a process that once took nearly a month and a half into a task completed in hours.
Let’s take a closer look at how this collaboration reshaped GeoPostcodes capabilities to deliver fresh data in orders of magnitudes of less time.
TLDR: The collaboration between Wherobots and GeoPostcodes yielded a significant improvement in processing time for GeoPostcodes new data products. This progress has powerful implications for development and applications in fields that rely on timely and accurate data processing. Read ahead for the details or check out the live stream recording.
So, why is this performance gain so important? The ability to process complex geospatial data in a fraction of the time has profound implications for a wide range of industries. The benefits are not isolated to reduced analysis time and system complexity. Running analysis quickly allows data engineers and analysts to iterate faster, test out new product / data ideas, and decrease time to market.
For urban planners, home builders, and retailers, having access to up-to-date population estimates can inform better decisions about infrastructure development, operations, and services. For logistics companies, understanding population distributions can enhance route planning and resource allocation. And for market analysts, demographic insights are invaluable for crafting targeted strategies and predicting consumer behavior.
By reducing processing times from weeks to hours, Wherobots and GeoPostcodes are empowering these industries to make faster, more informed decisions. The days of waiting weeks for analysis results are over—now, critical data can be at your fingertips in just a matter of hours.
Wherobots is at the forefront of distributed spatial computing technology with hosted and highly optimized Apache Sedona (our founders are the original creators of the project). We specialize in providing a platform that empowers data teams to build and deliver data products efficiently and at scale. Through a cloud native, highly optimized service, built for developers and data engineers, Wherobots enables data teams to iterate and produce faster by running complex spatial computations at unprecedented scale and speed. Wherobots technology scales to meet any data workload, offering the flexibility needed to handle even the most demanding spatial data processing tasks efficiently. Whether it’s raster or vector analysis, or even machine learning tasks, it delivers the computational power required to get the job done faster and more effectively, while maintaining 100% code compatibility with open source Apache Sedona.
GeoPostcodes has established itself as a trusted provider of high-quality geospatial data. Their datasets are vital to industries such as Logistics, Insurance, e-commerce, and market analysis, providing zip codes, cities, and administrative areas for 247 countries, available as 15.9M points (geocoding) and 880k boundaries (polygons). With their newest feature—population estimates derived from the GHS population grid—GeoPostcodes adds a valuable layer of demographic information to their datasets, giving users deeper insights and more powerful tools for decision-making.
Mapping Units:
GeoPostcodes provides high resolution administrative units at various scales (e.g. county, state, national) as well as zip code boundaries. The smallest administrative units and the zip code boundaries served as the mapping units. We enriched the administrative units with population estimates from the GHS data. Using the smallest administrative unit allowed us to roll the population estimates into the higher level administrative units reducing the need for additional processing.
Global Human Settlement data (GHS)
The Global Human Settlement layer, developed by the European Commission’s Joint Research Centre, provides global population estimates at a high spatial resolution. It is based on a combination of satellite imagery, census data, and other sources. The estimates are available in 5 year increments starting in 1975 and run through 2030. They are provided in a gridded format with a resolution of at various resolutions, for this analysis we chose the 100m resolution data.
Before the collaboration with Wherobots, GeoPostcodes had developed a method for deriving population estimates for their administrative boundaries dataset. This process used their PosGIS data warehouse and a dockerized headless instance of QGIS.
In this setup, the GHS population grid data is copied to the docker container running QGis. This dataset, rich with global population distribution information, was then subjected to a series of complex spatial operations. These operations included spatial joins and aggregations, which were necessary to map population estimates to the corresponding administrative boundaries.
Despite the robustness of this method, the entire process took thirty nine (39) days to complete. The sheer size of the datasets, coupled with the complexity of the spatial operations, meant that the database was constantly working at full capacity. This prolonged processing time posed a significant bottleneck, delaying critical processes for GeoPostcodes’ data operations teams.
This is where Wherobots came in. Leveraging the Wherobots distributed computing platform, we were able to reduce the processing time from weeks to just hours—a drastic improvement that has far-reaching implications for geospatial data analysis.
The process began with ingesting the administrative units and the GHS population grid into WherobotsDB. We leverage Wherobots Out-DB Rasters , RS_TileExplode() function, and repartitioning which allows more performant reading and workload distribution as only the required pixels for analysis are read and joined to the mapping units. From there, we partitioned the administrative units for even distribution of the workload. This partitioning allowed us to perform spatial joins and aggregations in parallel, significantly speeding up the process. While a single machine working through these operations sequentially would take days, our distributed approach meant that each task was handled simultaneously across multiple nodes, drastically reducing the overall processing time.
RS_TileExplode()
Looking at a subset of the results and comparing them to the runtime of the legacy system we can clearly see that leveraging Apache Sedona on Wherobots provides increased performance and dramatically reduced run time.
This collaboration between Wherobots and GeoPostcodes is just the beginning. As we continue to push the boundaries of what’s possible with distributed computing and geospatial analysis, we’re excited about the future possibilities.
On the horizon are even more advanced features and capabilities. Real-time data processing and machine learning integration are among the innovations we’re exploring, promising to make geospatial data analysis not just faster, but also smarter and more adaptive to changing conditions.
Together, Wherobots and GeoPostcodes are setting a new standard for efficiency and precision in the world of geospatial data. Whether you’re in urban planning, Insurance, logistics, or any other industry that relies on this data, the future’s looking brighter—and faster—than ever before.
Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence
RasterFlow takes insights and embeddings from satellite and overhead imagery datasets into Apache Iceberg tables, with ease and efficiency at any scale.
It takes 15 minutes for the Caltrain to get from Sunnyvale to SAP Center
That’s how long it took our MCP server to go from “how many bus stops are in Maryland” to an answer
Wherobots and Felt Partner to Modernize Spatial Intelligence
We’re excited to announce Wherobots and Felt are partnering to enable data teams to innovate with physical world data and move beyond legacy GIS, using the modern spatial intelligence stack. The stack with Wherobots and Felt provides a cloud-native, spatial processing and collaborative mapping solution that accelerates innovation and time-to-insight across an organization. What is […]
Scaling Spatial Analysis: How KNN Solves the Spatial Density Problem for Large-Scale Proximity Analysis
How we processed 44 million geometries across 5 US states by solving the spatial density problem that breaks traditional spatial proximity analysis
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: