The Spatial SQL API brings the performance of WherobotsDB to your favorite data applications
Since its launch last fall, Wherobots has raised the bar for cloud-native geospatial data analytics, offering the first and only platform for working with vector and raster geospatial data together at a planetary scale. Wherobots delivers a significant breadth of geospatial analytics capabilities, built around a cloud-native data lakehouse architecture and query engine that delivers […]
TABLE OF CONTENTS
Contributors
-
Maxime Petazzoni
Head of Engineering @ Wherobots. Engineering leader building great teams and products at Wherobots. Previously leading observability product and platform teams at Splunk/SignalFx.
-
Peter Foldes
I am a software engineer with a passion for distributed systems, video streaming, and gaming, with experience in roles ranging from data infrastructure to ad and video serving systems.
-
Damian Wylie
Damian leads product for Wherobots, and is driven to create intelligence for organizations at the intersection of earth, business, and society on a planetary scale.
Since its launch last fall, Wherobots has raised the bar for cloud-native geospatial data analytics, offering the first and only platform for working with vector and raster geospatial data together at a planetary scale. Wherobots delivers a significant breadth of geospatial analytics capabilities, built around a cloud-native data lakehouse architecture and query engine that delivers up to 60x better performance than incumbent solutions. Accessible through the powerful notebook experience data scientists and data engineers know and love, Wherobots Cloud is the most comprehensive, approachable, and fully-managed serverless offering for enabling spatial intelligence at scale.
Today, we’re announcing the Wherobots Spatial SQL API, powered by Apache Sedona, to bring the performance of WherobotsDB to your favorite data applications. This opens the door to a world of direct-SQL integrations with Wherobots Cloud, bringing a serverless cloud engine that’s optimized for spatial workloads at any scale into your spatial ETL pipelines and applications, and taking your users and engineers closer to your data and spatial insights.
Register for our release webinar on July 10th here: https://bit.ly/3yFlFYk
Developers love Wherobots because compute is abstracted and managed by Wherobots Cloud. Because it can run at a planetary scale, Wherobots streamlines development and reduces time to insight. It runs on a data lake architecture, so data doesn’t need to be copied into a proprietary storage system, and integrates into familiar development tools and interfaces for exploratory analytics and orchestrating production spatial ETL pipelines.
Utilize Apache Airflow or SQL IDEs with WherobotsDB via the Spatial SQL API
Wherobots Cloud and the Wherobots Spatial SQL API are powered by WherobotsDB, with Apache Sedona at its core: a distributed computation engine that can horizontally scale to handle computation and analytics on any dataset. Wherobots Cloud automatically manages the infrastructure and compute resources of WherobotsDB to serve your use case based on how much computation power you need.
Behind the scenes, your Wherobots Cloud “runtime” defines the amount of compute resources allocated and the configuration of the software environment that executes your workload (in particular for AI/ML use cases, or if your ETL or analytics workflow depends on 1st or 3rd party libraries).
Our always-free Community Edition gives access to a modest “Sedona” runtime for working with small-scale datasets. Our Professional Edition unlocks access to much larger runtimes, up to our “Tokyo” runtime capable of working on planetary-scale datasets, and GPU-accelerated options for your WherobotsAI workloads.
With the release of the Wherobots Spatial SQL API and its client SDKs, you can bring WherobotsDB, the ease-of-use, and the expressiveness of SQL to your Apache Airflow spatial ETL pipelines, your applications, and soon to tools like Tableau, Superset, and other 3rd party systems and applications that support JDBC.
Our customers love applying the performance and scalability of WherobotsDB to their data preparation workflows and their compute-intensive data processing applications.
Use cases include
- Preparation of nationwide and planetary-scale datasets for their users and customers
- Processing hundreds of millions of mobility data records every day
- Creating and analyzing spatial datasets in support of their real estate strategy and decision-making.
Now customers have the option to integrate new tools with Wherobots for orchestration and development of spatial insights using the Spatial SQL API.
How to get started with the Spatial SQL API
By establishing a connection to the Wherobots Spatial SQL API, a SQL session is started backed by your selected WherobotsDB runtime (or a “Sedona” by default but you can specify larger runtimes if you need more horsepower). Queries submitted through this connection are securely executed against your runtime, with compute fully managed by Wherobots.
We provide client SDKs in Java and in Python to easily connect and interact with WherobotsDB through the Spatial SQL API, as well as an Airflow Provider to build your spatial ETL DAGs; all of which are open-source and available on package registries, as well as on Wherobots’ GitHub page.
Using the Wherobots SQL Driver in Python
Wherobots provides an open-source Python library that exposes a DB-API 2.0 compatible interface for connecting to WherobotsDB. To build a Python application around the Wherobots DB-API driver, add the wherobots-python-dbapi
library to your project’s dependencies:
$ poetry add wherobots-python-dbapi
Or directly install the package on your system with pip
:
$ pip install wherobots-python-dbapi
From your Python application, establish a connection with wherobots.db.connect()
and use cursors to execute your SQL queries and use their results:
import logging
from wherobots.db import connect
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime
# Optionally, setup logging to get information about the driver's
# activity.
logging.basicConfig(
stream=sys.stdout,
level=logging.INFO,
format="%(asctime)s.%(msecs)03d %(levelname)s %(name)20s: %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
# Get your API key, or securely read it from a local file.
api_key = '...'
with connect(
host="api.cloud.wherobots.com",
api_key=get_secret(),
runtime=Runtime.SEDONA,
region=Region.AWS_US_WEST_2) as conn:
cur = conn.cursor()
sql = """
SELECT
id,
names['primary'] AS name,
geometry,
population
FROM
wherobots_open_data.overture_2024_02_15.admins_locality
WHERE localityType = 'country'
SORT BY population DESC
LIMIT 10
"""
cur.execute(sql)
results = cur.fetchall()
results.show()
For more information and future releases, see https://github.com/wherobots/wherobots-python-dbapi-driver on GitHub.
Using the Apache Airflow provider
Wherobots provides an open-source provider for Apache Airflow, defining an Airflow operator for executing SQL queries directly on WherobotsDB. With this new capability, you can integrate your spatial analytics queries, data preparation or data processing steps into new or existing Airflow workflow DAGs.
To build or extend your Airflow DAG using the WherobotsSqlOperator
, add the airflow-providers-wherobots
dependency to your project:
$ poetry add airflow-providers-wherobots
Define your connection to Wherobots; by default the Wherobots operators use the wherobots_default
connection ID:
$ airflow connections add "wherobots_default" \
--conn-type "wherobots" \
--conn-host "api.cloud.wherobots.com" \
--conn-password "$(< api.key)"
Instantiate the WherobotsSqlOperator
and with your choice of runtime and your SQL query, and integrate it into your Airflow DAG definition:
from wherobots.db.runtime import Runtime
import airflow_providers_wherobots.operators.sql.WherobotsSqlOperator
...
select = WherobotsSqlOperator(
runtime=Runtime.SEDONA,
sql="""
SELECT
id,
names['primary'] AS name,
geometry,
population
FROM
wherobots_open_data.overture_2024_02_15.admins_locality
WHERE localityType = 'country'
SORT BY population DESC
LIMIT 10
"""
)
# select.execute() or integrate into your Airflow DAG definition
For more information and future releases, see https://github.com/wherobots/airflow-providers-wherobots on GitHub.
Using the Wherobots SQL Driver in Java
Wherobots provides an open-source Java library that implements a JDBC (Type 4) driver for connecting to WherobotsDB. To start building Java applications around the Wherobots JDBC driver, add the following line to your build.gradle
file’s dependency section:
implementation "com.wherobots:wherobots-jdbc-driver"
In your application, you only need to work with Java’s JDBC APIs from the java.sql
package:
import com.wherobots.db.Region;
import com.wherobots.db.Runtime;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
// Get your API key, or securely read it from a local file.
String apiKey = "...";
Properties props = new Properties();
props.setProperty("apiKey", apiKey);
props.setProperty("runtime", Runtime.SEDONA);
props.setProperty("region", Region.AWS_US_WEST_2);
try (Connection conn = DriverManager.getConnection("jdbc:wherobots://api.cloud.wherobots.com", props)) {
String sql = """
SELECT
id,
names['primary'] AS name,
geometry,
population
FROM
wherobots_open_data.overture_2024_02_15.admins_locality
WHERE localityType = 'country'
SORT BY population DESC
LIMIT 10
""";
Statement stmt = conn.createStatement();
try (ResultSet rs = stmt.executeQuery(sql)) {
while (rs.next()) {
System.out.printf("%s: %s %f %s\n",
rs.getString("id"),
rs.getString("name"),
rs.getDouble("population"),
rs.getString("geometry"));
}
}
}
For more information and future releases, see https://github.com/wherobots/wherobots-jdbc-driver on GitHub.
Conclusion
The Wherobots Spatial SQL API takes Wherobots’ vision of hassle-free, scalable geospatial data analytics & AI one step further by making it the easiest way to run your Spatial SQL queries in the cloud. Paired with Wherobots and Apache Sedona’s comprehensive support for working with all geospatial data at any scale and in any format, and with Wherobots AI’s inference features available directly from SQL, the Wherobots Spatial SQL API is also the most flexible and the most capable platform for getting the most out of your data.
Wherobots vision
We exist because creating spatial intelligence at-scale is hard. Our contributions to Apache Sedona, leadership in the open geospatial domain, and investments in Wherobots Cloud have, and will make it easier. Users of Apache Sedona, Wherobots customers, and ultimately any AI application will be enabled to support better decisions about our physical and virtual worlds. They will be able to create solutions to improve these worlds that were otherwise infeasible or too costly to build. And the solutions developed will have a positive impact on society, business, and earth — at a planetary scale.
Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:
Contributors
-
Maxime Petazzoni
Head of Engineering @ Wherobots. Engineering leader building great teams and products at Wherobots. Previously leading observability product and platform teams at Splunk/SignalFx.
-
Peter Foldes
I am a software engineer with a passion for distributed systems, video streaming, and gaming, with experience in roles ranging from data infrastructure to ad and video serving systems.
-
Damian Wylie
Damian leads product for Wherobots, and is driven to create intelligence for organizations at the intersection of earth, business, and society on a planetary scale.