Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
TL;DR
Wherobots (powered by Apache Sedona) and PostgreSQL + PostGIS are complementary tools, not competitors. Use Wherobots for large-scale spatial ETL, processing speed, and distributed analytics. Use PostGIS for data persistence, transactional integrity, and long-term storage. Together, they form a complete spatial data pipeline. At scale, WherobotsDB is up to 317x faster than PostGIS on large spatial join queries, but PostGIS holds the edge on smaller datasets. The right approach is to process with Wherobots, then persist with PostGIS.
Spatial data has never been more abundant and valued in decision making (everything happens somewhere!). The tools we use to process location based data can significantly impact the outcomes of our projects. In the geospatial ecosystem, Apache Sedona-powered Wherobots and PostgreSQL with the PostGIS extension each offer robust capabilities. They share some functionalities, but they are more powerful when used together, rather than in isolation. This post explores how combining Wherobots and PostgreSQL + PostGIS can enhance spatial data processing, offering a synergy that leverages the unique strengths of each tool to enhance data analysis and decision-making processes.
PostgreSQL is a powerful, open-source object-relational database system known for its robustness and reliability. When combined with the PostGIS extension, PostgreSQL transforms into a spatial database capable of executing location-based queries and spatial functions. This combination is ideally suited for projects where data integrity and detailed transaction management (ACID transactions) are crucial.
Key Features:
Architecture:
PostgreSQL’s architecture, a traditional RDBMS, uses a process-based database server architecture. It is designed for detailed, transactional data management, where data integrity and complex query capabilities are paramount.
Characteristics:
Wherobots, founded by the original creators of Apache Sedona, represents a significant advancement in modern cloud native spatial data processing. Apache Sedona is an open-source cluster computing system designed for the processing of large-scale spatial data, and Wherobots (continuously optimizing Sedona for performance) employs this technology with serverless deployment principles to analyze data distributed across clusters. This approach allows Wherobots to perform complex spatial queries and analytics efficiently, handling massive datasets that are beyond the scope of traditional geospatial data processing tools.
The backbone of Wherobots, Apache Sedona, is designed with a modern cloud native and distributed computing framework in mind. Wherobots orchestrates and manages the cluster distributed compute nodes to enable large-scale spatial data processing by bringing compute power directly to the data. This architecture is purpose built for handling big data analytics and planetary-scale spatial data processing.
Sedona in Wherobots and PostgreSQL + PostGIS can both perform spatial queries, but their real power lies in how they complement each other. For example:
Let’s illustrate how these two systems can be integrated in a complimentary manner.
We’ll assume the persona of a fictitious company named “Synergistic GeoFusion Technologies (SGT) holdings”.
SGT holdings handles a vast array of spatial data from diverse sources such as sensors, satellites, and user inputs. The volume and velocity of the data collection requires a sophisticated approach to maximize efficiency. Wherobots steps in as the initial processing powerhouse, applying its Apache Sedona-based cluster computing to perform heavy-duty ETL (Extract, Transform, Load) tasks. This process involves cleansing, integrating, and transforming the raw data into a more structured format. Wherobots’ capability to handle massive datasets efficiently complements PostGIS’s robust data storage and querying capabilities by preparing the data for detailed spatial analysis and storage.
Once Wherobots processes the data, it can be seamlessly transferred to PostGIS, which cab serve as the system of record. This PostgreSQL extension is well-suited for ongoing data management, thanks to its sophisticated spatial functions and ability to handle complex queries. PostGIS’s strong querying capabilities complement Wherobots’ data processing strength, providing a stable platform for further data manipulation and refinement.
PostGIS is not only a storage layer but also a platform for ongoing data edits and updates, ensuring data integrity and relevance. When complex, resource-intensive spatial analyses are required, Wherobots is re-engaged. This dual engagement allows SGT holdings to handle routine data management in PostGIS while delegating computationally demanding tasks back to Wherobots, thus utilizing each system’s strengths to full effect.
For visualization, SGT holdings again leverages Wherobots to generate vector tiles from the analyzed data. These tiles are crucial for dynamic, scalable visual representations in the company’s internal dashboards and tools. Wherobots’ ability to generate these tiles efficiently complements PostGIS’s role by providing a means to visualize the data stored and managed within PostGIS. This not only enhances user interactions with the data but also provides a seamless bridge from data analysis to actionable insights through visual exploration.
Using Wherobots and PostGIS in this complimentary manner, SGT has established a highly efficient workflow that leverages the distinct capabilities of each system. They now have the capability to ingest all their data, manage it effectively, and run their post hoc analysis tasks to server internal and external clients efficiently and in a cost effective manner.
When comparing WherobotsDB to PostgreSQL with PostGIS for spatial queries, WherobotsDB’s performance benefits become evident as data size increases. Initially, PostGIS’ precomputed GIST indexes give it an edge on smaller datasets due to faster query execution times. However, as datasets grow larger, the dynamic, on-the-fly indexing of WherobotsDB surpasses PostGIS. The overhead associated with WherobotsDB’s distributed system is outweighed by its ability to efficiently handle complex, large-scale queries, ultimately making it significantly faster and more scalable in high-demand scenarios.
In essence, WherobotsDB may start off slower with smaller datasets, but its performance dramatically improves with larger datasets, far exceeding PostGIS’ capabilities. This makes WherobotsDB the preferred choice when working with extensive spatial data that demands efficient, scalable processing.
See the tables below for a detailed comparison of performance metrics.
WITH t AS ( SELECT * FROM overture_buildings_test AS buildings JOIN osm_nodes_test AS nodes ON ST_Intersects( nodes.geometry, buildings.geometry) ) SELECT COUNT(*) FROM t;
Individually, Wherobots and PostgreSQL + PostGIS are powerful tools. But when combined, they unlock new possibilities for spatial analysis, offering a balanced approach to handling both large-scale data processing and detailed, precise database management. By understanding and implementing their complementary capabilities, organizations can achieve more refined insights and greater operational efficiency in their spatial data projects.
By utilizing both tools strategically, companies can not only enhance their data analysis capabilities but also ensure that they are prepared for a variety of spatial data challenges, now and in the future.
To learn more about Wherobots, reach out for a technical session or get started with a free account.
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Change Detection Using AlphaEarth Foundations (Part 2)
Continue exploring how Alpha Earth Embeddings reveal change over time using scores.
AlphaEarth Embeddings, Zonal Statistics, and PCA
Aggregate AlphaEarth embeddings over Iowa fields and visualize them with PCA.
Introducing the Wherobots Python SDK
What is the Wherobots Python SDK? The Wherobots Python SDK is a typed Python client for submitting, monitoring, and managing Wherobots job runs. It ships on PyPI as wherobots-python-sdk. One install, one API key, and you’re running spatial jobs from any Python environment: CI/CD pipelines, notebooks, a local shell. The SDK is built for three […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: