Introduction
Spatial data has never been more abundant and valued in decision making (everything happens somewhere!). The tools we use to process location based data can significantly impact the outcomes of our projects. In the geospatial ecosystem, Apache Sedona-powered Wherobots and PostgreSQL with the PostGIS extension each offer robust capabilities. They share some functionalities, but they are more powerful when used together, rather than in isolation. This post explores how combining Wherobots and PostgreSQL + PostGIS can enhance spatial data processing, offering a synergy that leverages the unique strengths of each tool to enhance data analysis and decision-making processes.
Exploring PostgreSQL + PostGIS
PostgreSQL is a powerful, open-source object-relational database system known for its robustness and reliability. When combined with the PostGIS extension, PostgreSQL transforms into a spatial database capable of executing location-based queries and spatial functions. This combination is ideally suited for projects where data integrity and detailed transaction management (ACID transactions) are crucial.
Key Features:
- Data Integrity: Ensures high levels of data accuracy and consistency with ACID transaction compliance.
- Complex Queries: Supports a wide array of SQL and spatial queries for detailed data management.
- Integration: Seamlessly integrates with existing IT infrastructures and data systems.
Architecture:
PostgreSQL’s architecture, a traditional RDBMS, uses a process-based database server architecture. It is designed for detailed, transactional data management, where data integrity and complex query capabilities are paramount.
Characteristics:
- Transactional Integrity: PostgreSQL excels in maintaining data consistency and integrity through ACID (Atomicity, Consistency, Isolation, Durability) compliance, making it ideal for applications requiring reliable data storage.
- Structured Data Management: Data must be written into PostgreSQL tables, allowing for sophisticated querying and indexing capabilities provided by PostGIS. This ensures precise spatial data management.
- Integration with Legacy Systems: PostgreSQL + PostGIS can easily integrate with existing IT infrastructure, providing a stable environment for long-term data storage and complex spatial queries.
Understanding Wherobots & Apache Sedona
Wherobots, founded by the original creators of Apache Sedona, represents a significant advancement in modern cloud native spatial data processing. Apache Sedona is an open-source cluster computing system designed for the processing of large-scale spatial data, and Wherobots (continuously optimizing Sedona for performance) employs this technology with serverless deployment principles to analyze data distributed across clusters. This approach allows Wherobots to perform complex spatial queries and analytics efficiently, handling massive datasets that are beyond the scope of traditional geospatial data processing tools.
Key Features:
- Scalability: Easily scales horizontally to handle petabytes of data across through a cluster computing architecture.
- Speed: Utilizes memory-centric architecture to speed up data processing tasks.
- Complex Algorithms: Supports numerous spatial operations and algorithms essential for advanced geographic data analysis.
- Spatial SQL API: Over 300 (and growing) spatial type SQL functions covering both raster and vector analysis.
- Pythonic Dataframe API: Extends the Spark Dataframe API for spatial data processing.
- Python DB API: Python module conforming to the Python DB API 2.0 specification with familiar access patterns.
Architecture:
The backbone of Wherobots, Apache Sedona, is designed with a modern cloud native and distributed computing framework in mind. Wherobots orchestrates and manages the cluster distributed compute nodes to enable large-scale spatial data processing by bringing compute power directly to the data. This architecture is purpose built for handling big data analytics and planetary-scale spatial data processing.
Characteristics:
- Distributed Processing: Wherobots and Apache Sedona utilize a cluster of nodes, enabling parallel processing of massive datasets. This makes it ideal for tasks requiring high scalability and speed.
- In-Memory Computation: By leveraging in-memory processing, Sedona significantly reduces the time needed for complex spatial operations, allowing for rapid data exploration and transformation.
- Cloud Integration: Wherobots architecture is naturally aligned with cloud-based solutions, enabling seamless integration with cloud services like Amazon Web Services (AWS). This facilitates scalable and flexible spatial data analytics.
Complementarity in Action
Sedona in Wherobots and PostgreSQL + PostGIS can both perform spatial queries, but their real power lies in how they complement each other. For example:
- Data Handling and Performance: Apache Sedona in Wherobots Cloud is second to none at handling and analyzing large volumes of data, making them ideal for initial data processing, exploration, transformation, and analysis. In contrast, PostgreSQL + PostGIS excels in managing more detailed transactions and operations requiring high precision and integrity.
- Use Wherobots for: Scalable Data Operations, Wherobots efficiently scales processing across multiple nodes, ideal for rapid analysis of spatial data at local to global levels.
- Use PostgreSQL & PostGIS for: PERSISTENCE, PostgreSQL + PostGIS provides the tools necessary for precise data management crucial for applications like urban planning and asset management.
- Use Case Flexibility: Wherobots can quickly process vast quantities of spatial data, making them suitable for environments and applications that require immediate insights on large volumes of data. PostgreSQL + PostGIS, being more transaction oriented, is perfect for applications where long-term data storage and management are needed.
- Use Wherobots for: Processing Speed, when your scenario requires immediate response.
- Use PostgreSQL & PostGIS for: Data Management, in scenarios requiring meticulous data integrity, such as environmental monitoring over years or detailed urban planning projects.
- Analytical Depth: When initial processing (extractions, transformations, enrichment, etc.) on large scale data is done with Wherobots, ETL run times can be greatly reduced. Once processed, the data can be permanently stored and hosted in PostgreSQL + PostGIS. This allows users to perform deep, granular analyses at scale and then persist and serve those insights.
- Use Wherobots for: Transforming and Enriching Data: Extract, transform, enrich, refine, and analyze data in Wherobots Cloud at scale and with unparalleled speed.
- Use PostgreSQL & PostGIS for: Robust Data Ecosystem. Combining these tools ensures a robust data and analysis and management ecosystem capable of supporting complex, multi-dimensional spatial queries.
Integration Illustration
Let’s illustrate how these two systems can be integrated in a complimentary manner.
We’ll assume the persona of a fictitious company named “Synergistic GeoFusion Technologies (SGT) holdings”.
SGT holdings handles a vast array of spatial data from diverse sources such as sensors, satellites, and user inputs. The volume and velocity of the data collection requires a sophisticated approach to maximize efficiency. Wherobots steps in as the initial processing powerhouse, applying its Apache Sedona-based cluster computing to perform heavy-duty ETL (Extract, Transform, Load) tasks. This process involves cleansing, integrating, and transforming the raw data into a more structured format. Wherobots’ capability to handle massive datasets efficiently complements PostGIS’s robust data storage and querying capabilities by preparing the data for detailed spatial analysis and storage.
Once Wherobots processes the data, it can be seamlessly transferred to PostGIS, which cab serve as the system of record. This PostgreSQL extension is well-suited for ongoing data management, thanks to its sophisticated spatial functions and ability to handle complex queries. PostGIS’s strong querying capabilities complement Wherobots’ data processing strength, providing a stable platform for further data manipulation and refinement.
PostGIS is not only a storage layer but also a platform for ongoing data edits and updates, ensuring data integrity and relevance. When complex, resource-intensive spatial analyses are required, Wherobots is re-engaged. This dual engagement allows SGT holdings to handle routine data management in PostGIS while delegating computationally demanding tasks back to Wherobots, thus utilizing each system’s strengths to full effect.
For visualization, SGT holdings again leverages Wherobots to generate vector tiles from the analyzed data. These tiles are crucial for dynamic, scalable visual representations in the company’s internal dashboards and tools. Wherobots’ ability to generate these tiles efficiently complements PostGIS’s role by providing a means to visualize the data stored and managed within PostGIS. This not only enhances user interactions with the data but also provides a seamless bridge from data analysis to actionable insights through visual exploration.
Using Wherobots and PostGIS in this complimentary manner, SGT has established a highly efficient workflow that leverages the distinct capabilities of each system. They now have the capability to ingest all their data, manage it effectively, and run their post hoc analysis tasks to server internal and external clients efficiently and in a cost effective manner.
Performance Benchmark: Wherobots vs. PostGIS
When comparing WherobotsDB to PostgreSQL with PostGIS for spatial queries, WherobotsDB’s performance benefits become evident as data size increases. Initially, PostGIS’ precomputed GIST indexes give it an edge on smaller datasets due to faster query execution times. However, as datasets grow larger, the dynamic, on-the-fly indexing of WherobotsDB surpasses PostGIS. The overhead associated with WherobotsDB’s distributed system is outweighed by its ability to efficiently handle complex, large-scale queries, ultimately making it significantly faster and more scalable in high-demand scenarios.
In essence, WherobotsDB may start off slower with smaller datasets, but its performance dramatically improves with larger datasets, far exceeding PostGIS’ capabilities. This makes WherobotsDB the preferred choice when working with extensive spatial data that demands efficient, scalable processing.
See the tables below for a detailed comparison of performance metrics.
Datasets | Size | Number of Rows | Types of Geometries |
---|---|---|---|
OSM Nodes | 256GB | 7,456,990,919 | Point |
Overture Buildings | 103GB | 706,967,095 | Polygon/ Multipolygon |
OSM Postal Codes | 0.86GB | 154,452 | Polygon |
WITH t AS
(
SELECT
*
FROM
overture_buildings_test AS buildings
JOIN
osm_nodes_test AS nodes
ON ST_Intersects(
nodes.geometry,
buildings.geometry)
)
SELECT
COUNT(*)
FROM
t;
Dataset sizes | PostGIS | WherobotsDB (Cairo) | Relative Speed |
---|---|---|---|
1M Buildings x 1M Nodes | 0.70s | 2.24s | 0.31 x |
1M Buildings x 10M Nodes | 32.30s | 33.44s | 0.97 x |
10M Buildings x 1M Nodes | 28.30s | 21.72s | 1.30 x |
10M Buildings x 10M Nodes | 49.30s | 3m 36.29s | 0.23 x |
All Buildings x 1M Nodes | 8m 44s | 36.59s | 14.32 x |
All Buildings x 10M Nodes | 38m 30s | 49.77s | 46.42 x |
10K Buildings x All Nodes | 2h 29m 19s | 46.42s | 193.00 x |
100K Buildings x All Nodes | 4h 22m 36s | 51.75s | 304.57 x |
1M Buildings x All Nodes | 3h 29m 56s | 1m 19.63s | 158.17 x |
10M Buildings x All Nodes | 8h 48m 13s | 2m 13.65s | 237.17 x |
All Buildings x All Nodes | +24h (Aborted) | 4m 31.84s | + 317.83 x |
Conclusion: Better Together for Spatial Brilliance
Individually, Wherobots and PostgreSQL + PostGIS are powerful tools. But when combined, they unlock new possibilities for spatial analysis, offering a balanced approach to handling both large-scale data processing and detailed, precise database management. By understanding and implementing their complementary capabilities, organizations can achieve more refined insights and greater operational efficiency in their spatial data projects.
By utilizing both tools strategically, companies can not only enhance their data analysis capabilities but also ensure that they are prepared for a variety of spatial data challenges, now and in the future.
To learn more about Wherobots, reach out or give it a try with a trial account