Introducing RasterFlow: a planetary scale inference engine for Earth Intelligence LEARN MORE

Accelerating Vector and Raster Analysis at GeoPostcodes

Global Human Settlement data (GHS)

Accelerating Vector and Raster Analysis at GeoPostcodes

How long is too long to wait for a data set to process? In the fast-paced world of data as a service, efficiency isn’t just a nice-to-have; it’s essential. But for GeoPostcodes, before Wherobots, previous updates to the global population movement datasets took about 39 days. This was running on a headless QGIS instance in combination with PostGIS, combining population rasters with postal code boundaries.

For GeoPostcodes, before Wherobots, previous updates to the global population movement datasets took about 39 days.

At Wherobots, we’ve made it our mission to push the boundaries of what’s possible in distributed spatial computing to not just process data faster, but to increase productivity. When GeoPostcodes, a leader in geospatial data products, sought to enhance their administrative boundaries enrichment processes with population estimates derived from the Global Human Settlement (GHS) population grid, they knew they needed a new solution. Together, we’ve managed to transform a process that once took nearly a month and a half into a task completed in hours.

See how this works in an interactive notebook

Let’s take a closer look at how this collaboration reshaped GeoPostcodes capabilities to deliver fresh data in orders of magnitudes of less time.

TLDR: The collaboration between Wherobots and GeoPostcodes yielded a significant improvement in processing time for GeoPostcodes new data products. This progress has powerful implications for development and applications in fields that rely on timely and accurate data processing. Read ahead for the details or check out the live stream recording.

Why This Matters

So, why is this performance gain so important? The ability to process complex geospatial data in a fraction of the time has profound implications for a wide range of industries. The benefits are not isolated to reduced analysis time and system complexity. Running analysis quickly allows data engineers and analysts to iterate faster, test out new product / data ideas, and decrease time to market.

For urban planners, home builders, and retailers, having access to up-to-date population estimates can inform better decisions about infrastructure development, operations, and services. For logistics companies, understanding population distributions can enhance route planning and resource allocation. And for market analysts, demographic insights are invaluable for crafting targeted strategies and predicting consumer behavior.

By reducing processing times from weeks to hours, Wherobots and GeoPostcodes are empowering these industries to make faster, more informed decisions. The days of waiting weeks for analysis results are over—now, critical data can be at your fingertips in just a matter of hours.

About Wherobots and GeoPostcodes

Wherobots is at the forefront of distributed spatial computing technology with hosted and highly optimized Apache Sedona (our founders are the original creators of the project). We specialize in providing a platform that empowers data teams to build and deliver data products efficiently and at scale. Through a cloud native, highly optimized service, built for developers and data engineers, Wherobots enables data teams to iterate and produce faster by running complex spatial computations at unprecedented scale and speed. Wherobots technology scales to meet any data workload, offering the flexibility needed to handle even the most demanding spatial data processing tasks efficiently. Whether it’s raster or vector analysis, or even machine learning tasks, it delivers the computational power required to get the job done faster and more effectively, while maintaining 100% code compatibility with open source Apache Sedona.

GeoPostcodes has established itself as a trusted provider of high-quality geospatial data. Their datasets are vital to industries such as Logistics, Insurance, e-commerce, and market analysis, providing zip codes, cities, and administrative areas for 247 countries, available as 15.9M points (geocoding) and 880k boundaries (polygons). With their newest feature—population estimates derived from the GHS population grid—GeoPostcodes adds a valuable layer of demographic information to their datasets, giving users deeper insights and more powerful tools for decision-making.

A bit about the data

Mapping Units:

GeoPostcodes provides high resolution administrative units at various scales (e.g. county, state, national) as well as zip code boundaries. The smallest administrative units and the zip code boundaries served as the mapping units. We enriched the administrative units with population estimates from the GHS data. Using the smallest administrative unit allowed us to roll the population estimates into the higher level administrative units reducing the need for additional processing.

Global Human Settlement data (GHS)

Global population

The Global Human Settlement layer, developed by the European Commission’s Joint Research Centre, provides global population estimates at a high spatial resolution. It is based on a combination of satellite imagery, census data, and other sources. The estimates are available in 5 year increments starting in 1975 and run through 2030. They are provided in a gridded format with a resolution of at various resolutions, for this analysis we chose the 100m resolution data.

The Old Way: Over a month of processing in PostgreSQL

Before the collaboration with Wherobots, GeoPostcodes had developed a method for deriving population estimates for their administrative boundaries dataset. This process used their PosGIS data warehouse and a dockerized headless instance of QGIS.

In this setup, the GHS population grid data is copied to the docker container running QGis. This dataset, rich with global population distribution information, was then subjected to a series of complex spatial operations. These operations included spatial joins and aggregations, which were necessary to map population estimates to the corresponding administrative boundaries.

Despite the robustness of this method, the entire process took thirty nine (39) days to complete. The sheer size of the datasets, coupled with the complexity of the spatial operations, meant that the database was constantly working at full capacity. This prolonged processing time posed a significant bottleneck, delaying critical processes for GeoPostcodes’ data operations teams.

The Wherobots Solution: Hours Instead of Weeks

This is where Wherobots came in. Leveraging the Wherobots  distributed computing platform, we were able to reduce the processing time from weeks to just hours—a drastic improvement that has far-reaching implications for geospatial data analysis.

Here’s how we did it.

The process began with ingesting the administrative units and the GHS population grid into WherobotsDB. We leverage Wherobots  Out-DB RastersRS_TileExplode() function, and repartitioning which allows more performant reading and workload distribution as only the required pixels for analysis are read and joined to the mapping units. From there, we partitioned  the administrative units for even distribution of the workload. This partitioning allowed us to perform spatial joins and aggregations in parallel, significantly speeding up the process. While a single machine working through these operations sequentially would take days, our distributed approach meant that each task was handled simultaneously across multiple nodes, drastically reducing the overall processing time.

Results

Looking at a subset of the results and comparing them to the runtime of the legacy system we can clearly see that leveraging Apache Sedona on Wherobots provides increased performance and dramatically reduced run time.

CountryWherobots
Run Time (minutes)
Legacy System
Run Time (minutes)
Zip Code CountSpeed ratio
Japan362113,29120x
Russia66117042,98418x
Saudi Arabia312537,28244x
United States678030,144135x
Great Britain22510,05910x
France3236,0359x
Costa Rica1144709x

Looking Ahead: The Future of Geospatial Data

This collaboration between Wherobots and GeoPostcodes is just the beginning. As we continue to push the boundaries of what’s possible with distributed computing and geospatial analysis, we’re excited about the future possibilities.

On the horizon are even more advanced features and capabilities. Real-time data processing and machine learning integration are among the innovations we’re exploring, promising to make geospatial data analysis not just faster, but also smarter and more adaptive to changing conditions.

Together, Wherobots and GeoPostcodes are setting a new standard for efficiency and precision in the world of geospatial data. Whether you’re in urban planning, Insurance, logistics, or any other industry that relies on this data, the future’s looking brighter—and faster—than ever before.

Create a free account to try Wherobots