We are excited to announce a preview of WherobotsAI, our new suite of AI and ML powered capabilities that unlock spatial intelligence in satellite imagery and GPS location data. Additionally, we are bringing the high-performance of WherobotsDB to your favorite data applications with a Spatial SQL API that integrates WherobotsDB with more interfaces including Apache Airflow for Spatial ETL. Finally, we’re introducing the most scalable vector tile generator on earth to make it easier for teams to produce engaging and interactive map applications. All of these new features are capable of operating on planetary-scale data.
Watch the walkthrough of this release here.
Wherobots Mission and Vision
Before we dive into this release, we think it’s important to understand how these capabilities fit into our mission, our product principles, and vision for the Spatial Intelligence Cloud so you can see where we are headed.
Our Mission
These new capabilities are core to Wherobots’ mission, which is to unlock spatial intelligence of earth, society, and business, at a planetary scale. We will do this by making it extremely easy to utilize data and AI technology purpose-built for creating spatial intelligence that’s cloud-native and compatible with modern open data architectures.
Our Product Principles
- We’re building the spatial intelligence platform for modern organizations. Every organization with a mission directly linked to the performance of tangible assets, goods and services, or data products about what’s happening in the physical world, will need a spatial intelligence platform to be competitive, sustainable, and climate adaptive.
- It delivers intelligence for the greater good. Teams and their organizations want to analyze their worlds to create a net positive impact for business, society, and the earth.
- It’s purpose-built yet simple. Spatial intelligence won’t scale through in-house ‘spatial experts’, or through general purpose architectures that are not optimized for spatial workloads or development experiences.
- It’s efficient at any scale. Maximal performance, scale, and cost efficiency can only be achieved through a cloud-native, serverless solution.
- It creates intelligence with AI. Every organization will need AI alongside modern analytics to create spatial intelligence.
- It’s open by default. Pace of innovation depends on choice. Organizations that adopt cloud-native, open source compatible, and modern open data architectures will innovate faster because they have more choices in the solutions they can use.
Our Vision
We exist because creating spatial intelligence at-scale is hard. Our contributions to Apache Sedona, leadership in the open geospatial domain, and investments in Wherobots Cloud have, and will make it easier. Users of Apache Sedona, Wherobots customers, and ultimately any AI application will be enabled to support better decisions about our physical and virtual worlds. They will be able to create solutions to improve these worlds that were otherwise infeasible or too costly to build. And the solutions developed will have a positive impact on society, business, and earth — at a planetary scale.
Introducing WherobotsAI
There are petabytes of satellite or aerial imagery produced every day. Yet for most analysts, scientists, and developers, these datasets are analytically inaccessible outside of the naked eye. As a result most organizations still rely on humans and their eyes, to analyze satellite or other forms of aerial imagery. Wherobots can already perform analytics of overhead imagery (also known as raster data) and geospatial objects (known as vector data) simultaneously at scale. But organizations also want to use modern AI and ML technologies to streamline and scale otherwise visual, single threaded tasks like object detection, classification, and segmentation from overhead imagery.
Like satellite imagery that is generally hard to analyze, businesses also find it hard to analyze GPS data in their applications because it’s too noisy; points don’t always correspond to the actual path taken. Teams need an easy solution for snapping noisy GPS data to road or other segment types, at any scale.
Today we are announcing WherobotsAI which offers fully managed AI and machine learning capabilities that accelerate the development of spatial insights, for anyone familiar with SQL or Python. WherobotsAI capabilities include:
[new] Raster Inference (preview): A first of its kind, Raster Inference unlocks the analytical potential of satellite or aerial imagery at a planetary scale, by integrating AI models with WherobotsDB to make it extremely easy to detect, classify, and segment features of interest in satellite and aerial images. You can see how easy it is to detect and georeference solar farms here, with just a few lines of SQL:
SELECT
outdb_raster,
RS_SEGMENT(‘solar-satlas-sentinel2’, outdb_raster) AS solar_farm_result
FROM df_raster_input
These georeferenced predictions can be queried with WherobotsDB and can be interactively explored in a Wherobots notebook. Below is an example of detection of solar panels in SedonaKepler.
The models and AI infrastructure powering Raster Inference are fully managed, which means there’s nothing to set up or configure. Today, you can use Raster Inference to detect, segment, and classify solar farms, land cover, and marine infrastructure from terabyte-scale Sentinel-2 true color and multispectral imagery datasets in under half an hour, on our GPU runtimes available in the Wherobots Professional Edition. Soon we will be making the inference metadata for the models public, so if your own models meet this standard, they are supported by Raster Inference.
These models and datasets are just the starting point for WherobotsAI. We are looking forward to hearing from you to help us define the roadmap for what we should build support for next.
Map Matching: If you need to analyze trips at scale, but struggle to wrangle noisy GPS data, Map Matching is capable of turning billions of noisy GPS pings into signal, by snapping shared points to road or other vector segments. Teams are using Map Matching to process hundreds of millions of vehicle trips per hour. This speed surpasses any current commercial solutions, all for a cost of just a few hundred dollars.
Here’s an example of what WherobotsAI Map Matching does to improve the quality of your trip segments.
- Red and yellow line segments were created from raw, noisy GPS data.
- Green represents Map Matched segments.
Visit the user documentation to learn more and get started with WherobotsAI.
A Spatial SQL API for WherobotsDB
WherobotsDB, our serverless, highly efficient compute engine compatible with Apache Sedona is up to 60x more performant for spatial joins than popular general purpose big data engines and warehouses, and up to 20x faster than Apache Sedona on its own. It will remain the most performant, earth-friendly solution for your spatial workloads at any scale.
Until today, teams had two options for harnessing WherobotsDB: they could write and run queries in Wherobots managed notebooks, or run spatial ETL pipelines using the Wherobots jobs interface.
Today, we’re enabling you to bring the utility of WherobotsDB to more interfaces with the new Spatial SQL API. Using this API, teams can remotely execute Spatial SQL queries using a remote SQL editor, build first-party applications using our client SDKs in Python (WherobotsDB API driver) and Java (Wherobots JDBC driver), or orchestrate spatial ETL pipelines using a Wherobots Apache Airflow provider.
Run spatial queries with popular SQL IDEs
The following is an example of how to integrate Harlequin, a popular SQL IDE with WherobotsDB. You’ll need a Wherobots API key to get started with Harlequin (or any remote client). API keys allow you to authenticate with Wherobots Cloud for programmatic access to Wherobots APIs and services. API keys can be created following a few steps in our user documentation.
We will query WherobotsDB using Harlequin in the Airflow example later in this blog.
$ pip install harlequin-wherobots
$ harlequin -a wherobots --api-key $(< api.key)
You can find more information on how to use Harlequin in its documentation, and on the WherobotsDB adapter on its GitHub repository.
The Wherobots Python driver enables integration with many other tools as well. Here’s an example of using the Wherobots Python driver in the QGIS Python console to fetch points of interest from the Overture Maps dataset using Spatial SQL API.
from wherobots.db import connect
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime
import geopandas
from shapely import wkt
with connect(
token=os.environ.get("WBC_TOKEN"),
runtime=Runtime.SEDONA,
region=Region.AWS_US_WEST_2,
host="api.cloud.wherobots.com"
) as conn:
curr = conn.cursor()
curr.execute("""
SELECT names.common[0].value AS name, categories.main AS category, geometry
FROM wherobots_open_data.overture.places_place
WHERE ST_DistanceSphere(ST_GeomFromWKT("POINT (-122.46552 37.77196)"), geometry) < 10000
AND categories.main = "hiking_trail"
""")
results = curr.fetchall()
print(results)
results["geometry"] = results.geometry.apply(wkt.loads)
gdf = geopandas.GeoDataFrame(results, crs="EPSG:4326",geometry="geometry")
def add_geodataframe_to_layer(geodataframe, layer_name):
# Create a new memory layer
layer = QgsVectorLayer(geodataframe.to_json(), layer_name, "ogr")
# Add the layer to the QGIS project
QgsProject.instance().addMapLayer(layer)
add_geodataframe_to_layer(gdf, "POI Layer")
Visit the Wherobots user documentation to get started with the Spatial SQL API, or see our latest blog post that goes deeper into how to use our database drivers with the Spatial SQL API.
Automating Spatial ETL workflows with the Apache Airflow provider for Wherobots
ETL (extract, transform, load) workflows are oftentimes required to prepare spatial data for interactive analytics, or to refresh datasets automatically as new data arrives. Apache Airflow is a powerful and popular open source orchestrator of data workflows. With the Wherobots Apache Airflow provider, you can now use Apache Airflow to convert your spatial SQL queries into automated workflows running on Wherobots Cloud.
Here’s an example of the Wherobots Airflow provider in use. In this example we identify the top 100 buildings in the state of New York with the most places (facilities, services, business, etc.) registered within them using the Overture Maps dataset, and we’ll eventually auto-refresh the result daily. The initial view can be generated with the following SQL query:
CREATE TABLE wherobots.test_db.top_100_hot_buildings_daily AS
SELECT
buildings.id AS building,
first(buildings.names),
count(places.geometry) AS places_count,
'2023-07-24' AS ts
FROM wherobots_open_data.overture.places_place places
JOIN wherobots_open_data.overture.buildings_building buildings
ON ST_CONTAINS(buildings.geometry, places.geometry)
WHERE places.updatetime >= '2023-07-24'
AND places.updatetime < '2023-07-25'
AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), places.geometry)
AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), buildings.geometry)
GROUP BY building
ORDER BY places_count DESC
LIMIT 100
- A place in Overture is defined as real-world facilities, services, businesses or amenities.
- We used an arbitrary date of 2023-07-24.
- New York is defined by a simple bounding box polygon (79.762152, 40.496103, -71.856214, 45.01585) (we could alternatively join with its appropriate administrative boundary polygon)
- We use two WHERE clauses on places.updatetime to filter one day’s worth of data.
- The query creates a new table wherobots.test_db.top_100_hot_buildings_daily to store the query result. Note that it will not directly return any records because we are loading directly into a table.
Now, lets use Harlequin as described earlier to inspect the outcome of creating this table with the above query:
SELECT * FROM wherobots.test_db.top_100_hot_buildings_daily
Apache Airflow and the Airflow Provider for Wherobots allow you to schedule and execute this query each day, injecting the appropriate date filters into your templatized query.
- In your Apache Airflow instance, install the airflow-providers-wherobots library. You can either execute pip install airflow-providers-wherobots, or add the library to the dependency list of your Apache Airflow runtime.
- Create a new “generic” connection for Wherobots called wherobots_default, using api.cloud.wherobots.com as the “Host” and your Wherobots API key as the “Password”.
The next step is to create an Airflow DAG. The Wherobots Provider exposes the WherobotsSqlOperator for executing SQL queries. Update the hardcoded “2023-07-24” in your query into the Airflow template macros {ds} and {next_ds}, which will be rendered as the DAG schedule date on the fly:
import datetime
from airflow import DAG
from airflow_providers_wherobots.operators.sql import WherobotsSqlOperator
with DAG(
dag_id="example_wherobots_sql_dag",
start_date=datetime.datetime.strptime("2023-07-24", "%Y-%m-%d"),
schedule="@daily",
catchup=True,
max_active_runs=1,
):
operator = WherobotsSqlOperator(
task_id="execute_query",
wait_for_downstream=True,
sql="""
INSERT INTO wherobots.test_db.top_100_hot_buildings_daily
SELECT
buildings.id AS building,
first(buildings.names),
count(places.geometry) AS places_count,
'{{ ds }}' AS ts
FROM wherobots_open_data.overture.places_place places
JOIN wherobots_open_data.overture.buildings_building buildings
ON ST_CONTAINS(buildings.geometry, places.geometry)
WHERE places.updatetime >= '{{ ds }}'
AND places.updatetime < '{{ next_ds }}'
AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), places.geometry)
AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), buildings.geometry)
GROUP BY building
ORDER BY places_count DESC
LIMIT 100
""",
return_last=False,
)
You can visualize the status of the and log of the DAG’s execution in the Apache Airflow UI. As shown below, the operator prints out the exact query rendered and executed when you run your DAG.
Please visit the Wherobots user documentation for more details on how to set up your Apache Airflow instance with the Wherobots Provider.
Generate Vector Tiles — formatted as PMTiles — at Global Scale
Vector tiles are high resolution representations of features optimized for visualization, computed offline and displayed in map applications. This decouples dataset preparation from client side rendering driven by zooming and panning. By decoupling dataset preparation from the interactive experience, map developers use vector tiles to significantly improve the utility, clarity, and responsiveness of feature rich interactive map applications.
Traditional vector tiles generators like Tippecanoe are limited to the processing capability of a single VM and require the use of limited formats. These solutions are great for small-scale tile generation workloads when data is already in the right file format. But if you’re like the teams we’ve worked with, you may start small and need to scale past the limits of a single VM, or have a variety of file formats. You just want to generate vector tiles with the data you have, at any scale without having to worry about format conversion steps, configuring infrastructure, partitioning your workload around the capability of a VM, or waiting for workloads to complete.
Vector Tile Generation, or VTiles for WherobotsDB generates vector tiles in PMTiles format across common data lake formats, incredibly quickly and at a planetary scale, so you can start small and know you have the capability to scale without having to look for another solution. VTiles is incredibly fast because serverless computation is parallelized, and the WherobotsDB engine is optimized for vector tile generation. This means your development teams can spend less time building map applications that matter to your customers.
Using a Tokyo runtime, we generated vector tiles with VTiles for all buildings in the Overture dataset, from zoom levels 4-15 across the entire planet, in 23 minutes. That’s fast and efficient for a planetary scale operation. You can run the tile-generation-example
notebook in the Wherobots Pro tier to experience the speed and simplicity of Vtiles yourself. Here’s what this looks like:
Visit our user documentation to start generating vector tiles at-scale.
Try Wherobots now
We look forward to hearing how you put these new capabilities to work, along with your feedback to increase the usefulness of the Wherobots Cloud platform. You can try these new features today by creating a Wherobots Cloud account. WherobotsAI is a professional tier feature.
Please reach out on LinkedIn or connect to us on email at info@wherobots.com
Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter: