Wherobots Cloud: The Cloud-native Spatial Analytics Data Platform
According to Gartner, 97% of data collected at the enterprise sits on the shelves without being put into use. That is a shockingly big number, especially given that the data industry got their hopes up a few years back when the Economist published their article “The most valuable resource is no longer oil, it’s data”. […]
TABLE OF CONTENTS
Contributors
-
Jia Yu
Jia Yu is a co-founder and the Chief Architect of Wherobots Inc. He is the PMC Chair of Apache Sedona
-
Mo Sarwat
-
Maxime Petazzoni
Head of Engineering @ Wherobots. Engineering leader building great teams and products at Wherobots. Previously leading observability product and platform teams at Splunk/SignalFx.
According to Gartner, 97% of data collected at the enterprise sits on the shelves without being put into use. That is a shockingly big number, especially given that the data industry got their hopes up a few years back when the Economist published their article “The most valuable resource is no longer oil, it’s data”. That is also quite surprising given the 100s of billions of dollars invested in database and analytics platforms over the past two decades.
One main reason is that data professionals most of the time struggle to connect data to use cases. A natural way for data professionals to achieve that is to link their data/insights to data about the physical world, aka.“Spatial Data”, and hence ask physical-world related questions on such data, aka. "Spatial Analytics". This spatial approach can be an indispensable asset for businesses worldwide. Use cases range from determining optimal delivery routes to making informed decisions about property investments, to climate and agricultural technology. For instance, commercial real estate data will make more sense when connected to spatial data about nearby objects (e.g., building, POIs), man-made events (e.g, crimes, traffic), as well as natural events such as wildfires and floods. The importance of understanding the ‘where’ cannot be overstated, especially when it influences operational efficiency, customer satisfaction, and strategic planning.
The significance of spatial analytics underscores the pressing need for its efficient management within the enterprise data stack. Incumbent data platforms, often not built to handle the intricacies and scale of spatial analytics, fall short in meeting these demands. Recognizing this gap, we introduce Wherobots Cloud, a novel spatial analytics database platform. Here is a summary of features supported:
Wherobots Key Features
Linking Enterprise Data to the Spatial world
Wherobots seamlessly incorporates spatial analytics in the enterprise data stack to bring data many steps closer to use cases. Using a scalable spatial join technology, Wherobots can link customer data stored anywhere to tens of terabytes of spatial data such as maps, roads, buildings, natural events, and man-made events in a few minutes. Users can then apply spatial data processing, analytics, and AI tasks using SQL and Python on their data with unparalleled efficiency and adaptability.
To get started with Wherobots, please visit the Wherobots website.
Scalablilty
With its scalable, distributed architecture, Wherobots is redefining the way businesses handle geometry and raster spatial data processing and analytics in the cloud. Wherobots achieves that in two main ways:
- Separating Compute/Storage: Wherobots uniquely separates the spatial processing and analytics layer from the data storage layer. This approach allows for optimal performance and scalability.
- Distributed System Architecture: By employing a distributed system architecture, Wherobots ensures scalable out-of-core spatial computation, catering to massive datasets without compromising speed or accuracy.
Openness
Wherobots builds upon and amplifies the capabilities seen in the open-source Apache Sedona (OSS Sedona). While OSS Sedona provides foundational spatial analytics functions using spatial SQL and Python, Wherobots takes it to the next level with its faster query processing, lakehouse architecture, and its self-service yet fully-managed provisioning on Wherobots Cloud. This makes Wherobots a more comprehensive and streamlined solution for businesses. Based on our benchmarks, Wherobots is up to 10x faster than OSS Sedona for geometry data processing, and up to 20x faster than OSS Sedona for raster data processing.
Spatial Lakehouse Solution
One of the standout features of Wherobots is its support for an Apache Iceberg-compatible spatial table format, dubbed "Havasu." This feature facilitates efficient querying and updating of geometry and raster columns on Parquet files in cloud object stores such as AWS S3. This enables spatial analytics on the sheer volume of data dumped on cloud object stores this, until today, remains seldom put to use. Details about the Havasu spatial data lake format is avaialble here
Self-service
Wherobots is provisioned as a fully-managed service within the Wherobots Cloud, ensuring that users don’t have to delve into the intricacies of managing cloud or compute resources.
By delegating resource management to Wherobots, businesses can concentrate on their core spatial analytics tasks, achieving their objectives faster, efficiently, and cost-effectively.
To use Wherobots, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.
Connectivity
Wherobots comes equipped with connectors for major data storage platfoms and databases. This include cloud object stores, data warehouses like Snowflake and Redshift, lakehouses such as Databricks, and OLTP databases including Postgres / PostGIS.
Wherobots example usage
Using Wherobots, users can perform a plethora of spatial queries and analytics operations on their data. Here are some common operations users can invoke in Wherobots. For more details on these examples, please refer to the Wherobots documentation.
Insert geometry data
sedona.sql("""
INSERT INTO wherobots.test_db.test_table
VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3))
""")
Insert external raster data
sedona
.sql("SELECT RS_FromPath('s3a://XXX.tif') as rast")
.writeTo("wherobots.test_db.test_table")
.append()
Create a spatial index
sedona.sql("CREATE SPATIAL INDEX FOR wherobots.db.test_table USING hilbert(geom, 10)")
Read data from PostGIS
sedona.read
.format("jdbc")
.option("query", "SELECT id, ST_AsBinary(geom) as geom FROM my_table")
.load()
.withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))
Read data from CSV on AWS S3
sedona.read
.format("csv")
.load("s3a://data.csv")
Read data from a Havasu table
sedona.table("wherobots.test_db.test_table").filter("ST_Contains(city, location) = true")
sedona.sql("SELCT * FROM wherobots.test_db.test_table WHERE ST_Contains(city, location) = true")
Spatial join query to find zones prone to wild fires
fire_zone = sedona.sql(
"""
SELECT
z.geometry as zipgeom,
z.ZCTA5CE10 as zipcode,
f.FIRE_NAME
FROM
wherobots_open_data.us_census.zipcode z,
wherobots_open_data.weather.wild_fires f
WHERE
ST_Intersects(z.geometry, f.geometry)
"""
)
Visualize geometry data in a notebook
SedonaKepler.create_map(geometryDf)
Visualize raster data in a notebook
SedonaUtils.display_image(rasterDF.selectExpr("RS_AsImage(raster)"))
Performance Benchmark
To better showcase Wherobots’s performance, we conducted a comprehensive performance benchmark on some commonly seen spatial data processing tasks. Please download the full report of our Wherobots Performance Benchmark.
To use Wherobots, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.
Contributors
-
Jia Yu
Jia Yu is a co-founder and the Chief Architect of Wherobots Inc. He is the PMC Chair of Apache Sedona
-
Mo Sarwat
-
Maxime Petazzoni
Head of Engineering @ Wherobots. Engineering leader building great teams and products at Wherobots. Previously leading observability product and platform teams at Splunk/SignalFx.