Planetary-scale answers, unlocked.
A Hands-On Guide for Working with Large-Scale Spatial Data. Learn more.
Authors
According to Gartner, 97% of data collected at the enterprise sits on the shelves without being put into use. That is a shockingly big number, especially given that the data industry got their hopes up a few years back when the Economist published their article “The most valuable resource is no longer oil, it’s data”. That is also quite surprising given the 100s of billions of dollars invested in database and analytics platforms over the past two decades.
One main reason is that data professionals most of the time struggle to connect data to use cases. A natural way for data professionals to achieve that is to link their data/insights to data about the physical world, aka.“Spatial Data”, and hence ask physical-world related questions on such data, aka. “Spatial Analytics”. This spatial approach can be an indispensable asset for businesses worldwide. Use cases range from determining optimal delivery routes to making informed decisions about property investments, to climate and agricultural technology. For instance, commercial real estate data will make more sense when connected to spatial data about nearby objects (e.g., building, POIs), man-made events (e.g, crimes, traffic), as well as natural events such as wildfires and floods. The importance of understanding the ‘where’ cannot be overstated, especially when it influences operational efficiency, customer satisfaction, and strategic planning.
The significance of spatial analytics underscores the pressing need for its efficient management within the enterprise data stack. Incumbent data platforms, often not built to handle the intricacies and scale of spatial analytics, fall short in meeting these demands. Recognizing this gap, we introduce Wherobots Cloud, a novel spatial analytics database platform. Here is a summary of features supported:
Wherobots seamlessly incorporates spatial analytics in the enterprise data stack to bring data many steps closer to use cases. Using a scalable spatial join technology, Wherobots can link customer data stored anywhere to tens of terabytes of spatial data such as maps, roads, buildings, natural events, and man-made events in a few minutes. Users can then apply spatial data processing, analytics, and AI tasks using SQL and Python on their data with unparalleled efficiency and adaptability.
To get started with Wherobots, please visit the Wherobots website.
With its scalable, distributed architecture, Wherobots is redefining the way businesses handle geometry and raster spatial data processing and analytics in the cloud. Wherobots achieves that in two main ways:
Wherobots builds upon and amplifies the capabilities seen in the open-source Apache Sedona (OSS Sedona). While OSS Sedona provides foundational spatial analytics functions using spatial SQL and Python, Wherobots takes it to the next level with its faster query processing, lakehouse architecture, and its self-service yet fully-managed provisioning on Wherobots Cloud. This makes Wherobots a more comprehensive and streamlined solution for businesses. Based on our benchmarks, Wherobots is up to 10x faster than OSS Sedona for geometry data processing, and up to 20x faster than OSS Sedona for raster data processing.
One of the standout features of Wherobots is its support for an Apache Iceberg-compatible spatial table format, dubbed “Havasu.” This feature facilitates efficient querying and updating of geometry and raster columns on Parquet files in cloud object stores such as AWS S3. This enables spatial analytics on the sheer volume of data dumped on cloud object stores this, until today, remains seldom put to use. Details about the Havasu spatial data lake format is avaialble here
Wherobots is provisioned as a fully-managed service within the Wherobots Cloud, ensuring that users don’t have to delve into the intricacies of managing cloud or compute resources.
By delegating resource management to Wherobots, businesses can concentrate on their core spatial analytics tasks, achieving their objectives faster, efficiently, and cost-effectively.
To use Wherobots, you first need to create an account on Wherobots Cloud. To get started, please visit the Wherobots website.
Wherobots comes equipped with connectors for major data storage platfoms and databases. This include cloud object stores, data warehouses like Snowflake and Redshift, lakehouses such as Databricks, and OLTP databases including Postgres / PostGIS.
Using Wherobots, users can perform a plethora of spatial queries and analytics operations on their data. Here are some common operations users can invoke in Wherobots. For more details on these examples, please refer to the Wherobots documentation.
sedona.sql(""" INSERT INTO wherobots.test_db.test_table VALUES (1, 'a', ST_GeomFromText('POINT (1 2)')), (2, 'b', ST_Point(2, 3)) """)
sedona .sql("SELECT RS_FromPath('s3a://XXX.tif') as rast") .writeTo("wherobots.test_db.test_table") .append()
sedona.sql("CREATE SPATIAL INDEX FOR wherobots.db.test_table USING hilbert(geom, 10)")
sedona.read .format("jdbc") .option("query", "SELECT id, ST_AsBinary(geom) as geom FROM my_table") .load() .withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))
sedona.read .format("csv") .load("s3a://data.csv")
sedona.table("wherobots.test_db.test_table").filter("ST_Contains(city, location) = true") sedona.sql("SELCT * FROM wherobots.test_db.test_table WHERE ST_Contains(city, location) = true")
fire_zone = sedona.sql( """ SELECT z.geometry as zipgeom, z.ZCTA5CE10 as zipcode, f.FIRE_NAME FROM wherobots_open_data.us_census.zipcode z, wherobots_open_data.weather.wild_fires f WHERE ST_Intersects(z.geometry, f.geometry) """ )
SedonaKepler.create_map(geometryDf)
SedonaUtils.display_image(rasterDF.selectExpr("RS_AsImage(raster)"))
To better showcase Wherobots’s performance, we conducted a comprehensive performance benchmark on some commonly seen spatial data processing tasks. Please download the full report of our Wherobots Performance Benchmark.
How We Delivered “Fields of The World” with RasterFlow: A Planetary-Scale GeoAI Pipeline
See how we used RasterFlow to run a 100TB+ global GeoAI pipeline, from feature mosaics to predictions and vectors, with reproducible workflows.
Spatial Data Pipeline Architecture: PostGIS and Wherobots Together
In the world of data architecture, there is a dangerous myth that you have to choose “one tool to rule them all.” We often see organizations paralyzed by the debate: “Should we use a Database or a Data Lake?” A spatial data pipeline architecture built for both large-scale analytics and operational queries is one of […]
Iceberg v3 Gets Native Geo Types. It’s More Than a Format Upgrade
Introduction Geospatial data touches nearly every industry, and until recently, the open lakehouse had no native way to handle it. Snowflake recently announced Iceberg v3 support with native geometry and geography types. It’s the first major engine to ship the geospatial extensions to the Iceberg spec. These types are now part of the open standard, […]
share this article
Awesome that you’d like to share our articles. Where would you like to share it to: