SedonaDB: The Cloud-native Spatial Analytics Database Platform
According to Gartner, 97% of data collected at the enterprise sits on the shelves without being put into use. That is a shockingly big number, especially given that the data industry got their hopes up a few years back when the Economist published their article “The most valuable resource is no longer oil, it’s data”. […]
TABLE OF CONTENTS
According to Gartner, 97% of data collected at the enterprise sits on the shelves without being put into use. That is a shockingly big number, especially given that the data industry got their hopes up a few years back when the Economist published their article “The most valuable resource is no longer oil, it’s data”. That is also quite surprising given the 100s of billions of dollars invested in database and analytics platforms over the past two decades.
One main reason is that data professionals most of the time struggle to connect data to use cases. A natural way for data professionals to achieve that is to link their data/insights to data about the physical world, aka.“Spatial Data”, and hence ask physical-world related questions on such data, aka. "Spatial Analytics". This spatial approach can be an indispensable asset for businesses worldwide. Use cases range from determining optimal delivery routes to making informed decisions about property investments, to climate and agricultural technology. For instance, commercial real estate data will make more sense when connected to spatial data about nearby objects (e.g., building, POIs), man-made events (e.g, crimes, traffic), as well as natural events such as wildfires and floods. The importance of understanding the ‘where’ cannot be overstated, especially when it influences operational efficiency, customer satisfaction, and strategic planning.
The significance of spatial analytics underscores the pressing need for its efficient management within the enterprise data stack. Incumbent data platforms, often not built to handle the intricacies and scale of spatial analytics, fall short in meeting these demands. Recognizing this gap, we introduce SedonaDB, a novel spatial analytics database platform. Here is a summary of features supported by SedonaDB:
SedonaDB Key Features
Linking Enterprise Data to the Spatial world
SedonaDB seamlessly incorporates spatial analytics in the enterprise data stack to bring data many steps closer to use cases. Using a scalable spatial join technology, SedonaDB can link customer data stored anywhere to tens of terabytes of spatial data such as maps, roads, buildings, natural events, and man-made events in a few minutes. Users can then apply spatial data processing, analytics, and AI tasks using SQL and Python on their data with unparalleled efficiency and adaptability.
To get started with SedonaDB, please visit the Wherobots website.
Scalablilty
With its scalable, distributed architecture, SedonaDB is redefining the way businesses handle geometry and raster spatial data processing and analytics in the cloud. SedonaDB achieves that in two main ways:
- Separating Compute/Storage: SedonaDB uniquely separates the spatial processing and analytics layer from the data storage layer. This approach allows for optimal performance and scalability.
- Distributed System Architecture: By employing a distributed system architecture, SedonaDB ensures scalable out-of-core spatial computation, catering to massive datasets without compromising speed or accuracy.
Openness
SedonaDB builds upon and amplifies the capabilities seen in the open-source Apache Sedona (OSS Sedona). While OSS Sedona provides foundational spatial analytics functions using spatial SQL and Python, SedonaDB takes it to the next level with its faster query processing, lakehouse architecture, and its self-service yet fully-managed provisioning on Wherobots Cloud. This makes SedonaDB a more comprehensive and streamlined solution for businesses. Based on our benchmarks, SedonaDB is up to 10x faster than OSS Sedona for geometry data processing, and up to 20x faster than OSS Sedona for raster data processing.
Spatial Lakehouse Solution
One of the standout features of SedonaDB is its support for an Apache Iceberg-compatible spatial table format, dubbed "Havasu." This feature facilitates efficient querying and updating of geometry and raster columns on Parquet files in cloud object stores such as AWS S3. This enables spatial analytics on the sheer volume of data dumped on cloud object stores this, until today, remains seldom put to use. Details about the Havasu spatial data lake format is avaialble here
Self-service
SedonaDB is provisioned as a fully-managed service within the Wherobots Cloud, ensuring that users don’t have to delve into the intricacies of managing cloud or compute resources.
By delegating resource management to Wherobots, businesses can concentrate on their core spatial analytics tasks, achieving their objectives faster, efficiently, and cost-effectively.
To use SedonaDB, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.
Connectivity
SedonaDB comes equipped with connectors for major data storage platfoms and databases. This include cloud object stores, data warehouses like Snowflake and Redshift, lakehouses such as Databricks, and OLTP databases including Postgres / PostGIS.
SedonaDB example usage
Using SedonaDB, users can perform a plethora of spatial queries and analytics operations on their data. Here are some common operations users can invoke in SedonaDB. For more details on these examples, please refer to the Wherobots documentation.
Insert geometry data
sedona.sql("""
INSERT INTO wherobots.test_db.test_table
VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3))
""")
Insert external raster data
sedona
.sql("SELECT RS_FromPath('s3a://XXX.tif') as rast")
.writeTo("wherobots.test_db.test_table")
.append()
Create a spatial index
sedona.sql("CREATE SPATIAL INDEX FOR wherobots.db.test_table USING hilbert(geom, 10)")
Read data from PostGIS
sedona.read
.format("jdbc")
.option("query", "SELECT id, ST_AsBinary(geom) as geom FROM my_table")
.load()
.withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))
Read data from CSV on AWS S3
sedona.read
.format("csv")
.load("s3a://data.csv")
Read data from a Havasu table
sedona.table("wherobots.test_db.test_table").filter("ST_Contains(city, location) = true")
sedona.sql("SELCT * FROM wherobots.test_db.test_table WHERE ST_Contains(city, location) = true")
Spatial join query to find zones prone to wild fires
fire_zone = sedona.sql(
"""
SELECT
z.geometry as zipgeom,
z.ZCTA5CE10 as zipcode,
f.FIRE_NAME
FROM
wherobots_open_data.us_census.zipcode z,
wherobots_open_data.weather.wild_fires f
WHERE
ST_Intersects(z.geometry, f.geometry)
"""
)
Visualize geometry data in a notebook
SedonaKepler.create_map(geometryDf)
Visualize raster data in a notebook
SedonaUtils.display_image(rasterDF.selectExpr("RS_AsImage(raster)"))
Performance Benchmark
To better showcase SedonaDB’s performance, we conducted a comprehensive performance benchmark on some commonly seen spatial data processing tasks. Please download the full report of our SedonaDB Performance Benchmark.
To use SedonaDB, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.