Announcing SAML Single Sign-On (SSO) Support

We’re excited to announce that Wherobots Cloud now supports SAML SSO for Professional Edition customers. This enhancement underscores our commitment to providing robust security features tailored to the needs of businesses that handle sensitive data.

What is SAML Single Sign-On

SAML Single Sign on Graph

Many companies already use SAML SSO. It is the mechanism by which you log in to third party tools with your company’s centralized login (Google, Okta, etc.)

SAML (Security Assertion Markup Language) are essential components of modern security protocols. SAML is an open standard for exchanging authentication and authorization data between parties and SSO (Single Sign-On) allows users to log in once and gain access to multiple systems without re-entering credentials.

How to enable SAML SSO

Setting up SAML SSO is straightforward. Any Professional Edition Wherobots administrator can enable this feature by following these steps:

  1. Verify Your Domain:

    Go to your organization’s settings, copy the provided TXT DNS record, and add it to your domain provider’s DNS settings. Once done, click “Verify” in Wherobots.

  2. Configure Your Identity Provider (IdP):

    Log in to your IdP (e.g., Google Workspace, Okta, OneLogin, Azure AD) and configure it using the SAML details provided in the Wherobots settings.
    SAML Configuration

  3. Enter IdP Details into Wherobots:

    Input the Identity Provider Issuer URL, SSO URL, and certificate details from your IdP into the SAML section in Wherobots.
    SAML IdP Details view

  4. Enable SAML SSO:

    Make sure there’s an admin user with an email from the verified domain. Then, toggle the “Enable” switch in Wherobots to activate SSO.

  5. Test the Integration:

    Log in using your verified domain email to ensure everything is set up correctly.

For more detailed step-by-step instructions, check out our comprehensive getting started guide.

Why Your Organization Should Use SAML Single Sign-On

Benefits to Users

For users, SAML SSO simplifies the login process. With fewer passwords to remember and less frequent logins, users save time and reduce frustration. This streamlined access means employees can focus more on their tasks and less on managing multiple credentials.

Benefits to Organizations

Organizations benefit from SAML SSO by enhancing security and reducing administrative overhead. Centralized authentication means fewer password-related support tickets and a lower risk of phishing attacks, as users are less likely to fall for credential-stealing schemes. Moreover, it ensures compliance with security policies and regulatory requirements by enforcing strong, consistent access controls. It also allows organizations to grant access to certain third party services only for certain users/groups of users.

Location data is particularly sensitive, as it can reveal personal habits, routines, and even confidential business operations. For example, in healthcare, location data of patients visiting specialized clinics could inadvertently disclose medical information. By implementing SAML SSO, organizations can better control access, ensuring that only authorized personnel can view and interact with this information within the Wherobots platform.

Elevate Your Security with SAML SSO

Take this opportunity to simplify authentication, reduce security risks, and improve productivity. If you are not already a Professional Edition customer, upgrade to gain access to SAML SSO. By upgrading, you’ll not only bolster your security measures and simplify logins, but you’ll gain access to more powerful workloads, service principals, and more. Contact us to schedule a demo or get started!

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

Wherobots Joins Overture, Winning The Taco Wars, Spatial SQL API, Geospatial Index Podcast – This Month In Wherobots

Welcome to This Month In Wherobots the monthly developer newsletter for the Wherobots & Apache Sedona community! This month we have news about Wherobots and the Overture Maps Foundation, a deep dive on new Wherobots Cloud features like raster inference, generating vector tiles, and the Spatial SQL API, plus a look at retail cannibalization analysis for the commercial real estate industry.

Wherobots Joins Overture Maps Foundation

Wherobots joins Overture Maps Foundation

Wherobots has officially joined Overture Maps Foundation to support the next generation of planetary-scale open map data. Wherobots has supported the development of Overture datasets through Overture Maps Foundation’s use of the open-source Apache Sedona project to develop and distribute global data, enabling Overture to embrace modern cloud-native geospatial technologies like GeoParquet. By joining Overture as Contributing Members Wherobots will continue to support the ongoing development, distribution, and evolution of this critical open dataset that enables developers and data practitioners to make sense of the world around us.

Read the announcement blog post

Featured Community Members: Sean Knight & Ilya Marchenko

Apache Sedona featured community members - July 2024

This month’s featured community members are Sean Knight and Ilya Marchenko from YuzuData where they focus on AI and location intelligence for the commercial real estate industry. YuzuData is a Wherobots partner and leverages the power of Apache Sedona and Wherobots Cloud as part of their work analyzing large scale geospatial data. Sean and Ilya recently wrote a blog post showing how to use Wherobots for a retail cannibalization study. Thanks Sean and Ilya for being a part of the community and sharing how you’re building geospatial products using Wherobots!

Comparing Taco Chains: A Consumer Retail Cannibalization Study With Isochrones

Retail cannibalization analysis with Wherobots

Understanding the impact of opening a new retail location on existing locations is an important analysis in the commercial real estate industry. In this code-heavy blog post Sean and Ilya from YuzuData detail a retail cannibalization analysis using WherobotsDB, Overture Maps point of interest data, drive-time isochrones using the Valhalla API, and visualization with SedonaKepler. Sean also presented this analysis earlier this week in a live webinar.

Read the blog post or watch the video recording

Unlock Satellite Imagery Insights With WherobotsAI Raster Inference

Raster inference with WherobotsAI

One of the most exciting features in Wherobots’ latest release is WherobotsAI Raster Inference which enables running machine learning models on satellite imagery for object detection, segmentation, and classification. This post gives a detailed look at the types of models supported by WherobotsAI and an overview of the SQL and Python APIs for raster inference with an example of identifying solar farms for the purpose of mapping electricity infrastructure.

Read the blog post to learn more about WherobotsAI Raster Inference

Generating Global PMTiles In 26 Minutes With WherobotsDB VTiles

Generating PMTiles with Wherobots VTiles vector tiles generator

WherobotsDB VTiles is a highly scalable vector tile generator capable of generating vector tiles from small to planetary scale datasets quickly and cost-efficiently and supports the PMTiles format. In this post we see how to generate vector tiles of the entire planet using three Overture layers. Using Wherobots Cloud to generate PMTiles of the Overture buildings layer takes 26 minutes. The post includes all code necessary to recreate these tile generation operations and a discussion of performance considerations.

Read the blog post to learn more about WherobotsDB VTiles

Spatial SQL API Brings Performance Of WherobotsDB To Your Favorite Data Applications

Using Apache Airflow with WherobotsDB

The Wherobots Spatial SQL API enables integration with Wherobots Cloud via Python and Java client drivers. In addition to enabling integrations with your favorite data applications via the client drivers, Wherobots has released an Apache Airflow provider for orchestrating data pipelines and an integration with Harlequin, a popular SQL IDE.

Read the blog post to learn more about the Wherobots Spatial SQL API

Wherobots On The Geospatial Index Podcast

Wherobots On The Geospatial Index Podcast

William Lyon from Wherobots was recently a guest on The Geospatial Index podcast. In this episode he discusses the origins of Apache Sedona, the open-source technology behind Wherobots, how users are building spatial data products at massive scale with Wherobots, how Wherobots is improving the developer experience around geospatial analytics, and much more.

Watch the video recording

Upcoming Events

  • Apache Sedona Community Office Hours (Online – August 6th) – Join the Apache Sedona community for updates on the state of Apache Sedona, presentation and demo of recent features, and provide your input into the roadmap, future plans, and contribution opportunities.
  • GeoMeetup: Cloud Native Spatial Data Stack (San Francisco – September 5th) – Join us on September 5th for an exciting GeoMeetup featuring talks from industry leaders with Wherobots and Felt.com. In this meetup we will be exploring the elements of the cloud native spatial data stack.
  • FOSS4G NA 2024 (St Louis – September 9th-11th) – FOSS4G North America is the premier open geospatial technology and business conference. Join the Wherobots team for a pre-conference workshop or come by and chat with us at the Wherobots booth to learn about the latest developments in Apache Sedona.

Want to receive this monthly update in your inbox? Sign up for the This Month In Wherobots Developer Newsletter:


Wherobots Joins Overture Maps Foundation As Contributing Member To Enable Open Cloud-Native Geospatial Intelligence

Wherobots is excited to share that we have officially joined Overture Maps Foundation as a Contributing Member to support the next generation of planetary-scale open map data. Wherobots believes wholeheartedly in Overture’s mission to bring open global-scale map data to the world while leveraging cloud-native technologies to enable efficient and accessible usage of these datasets.

Wherobots Joins Overture Maps Foundation

Wherobots has supported the development of Overture datasets through Overture Maps Foundation’s use of the open-source Apache Sedona project to develop and distribute the Overture datasets, enabling Overture to embrace modern cloud-native geospatial technologies like GeoParquet. In addition, Wherobots has made the Overture datasets available within the Wherobots Cloud platform as part of the Wherobots Spatial Catalog which is one of the fastest and most efficient ways to query and analyze the Overture datasets.

By joining Overture Maps Foundation as Contributing Members Wherobots will continue to support the ongoing development, distribution, and evolution of this critical open dataset that enables developers and data practitioners to make sense of the world around us.

"Overture Maps’ mission is about more than building open map data. It is also about helping users access, discover and use that data, whether for building map applications or for spatial ETL, analytics and intelligence," said Marc Prioleau, executive director of the Overture Maps Foundation. “Wherobots has already contributed to the project through  its support of Overture data on Apache Sedona and we look forward to working with them even more closely in the future as a member of the team."

Marc Prioleau, Executive Director, Overture Maps Foundation

About Overture Maps Foundation

Founded in 2022, Overture Maps Foundation is the world’s leading home for collaboration on the development of reliable, easy-to-use, and interoperable open map data that will power current and next-generation map products. These interoperable set of map data assets are the basis for extensibility, enabling companies to contribute their own data. Members combine resources to build map data that is complete, accurate, and refreshed as the physical world changes. Map data will be open and extensible by all under an open data license. You can learn more about Overture Maps Foundation at overturemaps.org.

Why Supporting Planetary Scale Open Map Data Is Important

As the Spatial Intelligence Cloud, the Wherobots platform enables data practitioners to create large-scale spatial data products and for organizations to find insights in spatial data at scale. We do this by supporting the open-source Apache Sedona project that adds spatial functionality to distributed compute frameworks. We then build on top of Apache Sedona to manage the infrastructure needed for large-scale geospatial intelligence with a serverless architecture. And additionally extend the developer experience of Apache Sedona in Wherobots Cloud with APIs, governance and optimization of spatial joins and other spatial operations.

Wherobots’ mission is to unlock spatial intelligence of earth, society, and business, at a planetary scale. Global scale open data is a key component to enabling this mission to understanding the world. Overture Maps Foundation data provides an important component toward enabling this mission, while aligning with our vision of open data architecture. Similarly, assembling and making sense of planetary-scale datasets like the Overture data requires the usage of scalable cloud-native geospatial technology which aligns perfectly with Wherobots’ mission.

Finally, Wherobots Cloud offers a cloud-native platform for geospatial analytics while also supporting an open data infrastructure. To learn more about how Wherobots Cloud is pushing forward the state of geospatial intelligence, see our blog post covering the latest Wherobots release: Introducing WherobotsAI for Planetary Inference, and Capabilities That Modernize Spatial Intelligence At Scale.

What’s Next For Wherobots And Overture

We’re committed to continuing to support the development and evolution of both Overture Maps Foundation as an organization, and Overture’s adoption of cloud-native geospatial technologies. We’re excited to see where we can go with planetary-scale open map data for the world.

As an example of the type of large-scale data processing Wherobots Cloud enables with Overture data, we recently demonstrated how to make use of the Wherobots VTiles distributed vector tiles generator to generate global PMTiles of Overture data in 26 minutes and also how to analyze the Overture Places dataset to efficiently execute spatial queries to find insights into urban dynamics.

To get started working with Overture data today in Wherobots Cloud create a free Wherobots Cloud account and run some of the example tutorial notebooks available within that demonstrate how to query, analyze, and visualize Overture data (as well as other large-scale datasets and use cases).

Want to keep up to date with the latest news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:


Easily create trip insights at scale by snapping millions of GPS points to road segments using WherobotsAI Map Matching

What is Map Matching?

GPS data is inherently noisy and often lacks precision, which can make it challenging to extract accurate insights. This imprecision means that the GPS points logged may not accurately represent the actual locations where a device was. For example, GPS data from a drive around a lake may incorrectly include points that are over the water!

To address these inaccuracies, teams commonly use two approaches:

  1. Identifying and Dropping Erroneous Points: This method involves manually or algorithmically filtering out GPS points that are clearly incorrect. However, this approach can reduce analytical accuracy, be costly, and is time-intensive.
  2. Map Matching Techniques: A smarter and more effective approach involves using map matching techniques. These techniques take the noisy GPS data points and compute the most likely path taken based on known transportation segments such as roadways or trails.

WherobotsAI Map Matching offers an advanced solution for this problem. It performs map matching with high scale on millions or even billions of trips with ease and performance, ensuring that the GPS data aligns accurately with the actual paths most likely taken.

map matching telematics

An illustration of map matching. Blue dots: GPS samples, Green line: matched trajectory.

Map matching is a common solution for preparing GPS data for use in a wide range of applications including:

  • Sattelite & GPS based navigation
  • GPS tracking of freight
  • Assessing risk of driving behavior for improved insurance pricing
  • Post hoc analysis of self driving car trips for telematics teams
  • Transportation engineering and urban planning

The objective of map matching is to accurately determine which road or path in the digital map corresponds to the observed geographic coordinates, considering factors such as the accuracy of the location data, the density and layout of the road network, and the speed and direction of travel.

Existing Solutions for Map Matching

Most map matching implementations are variants of the Hidden Markov Model (HMM)-based algorithm described by Newson and Krumm in their seminal paper, "Hidden Markov Map Matching through Noise and Sparseness." This foundational research has influenced a variety of map matching solutions available today.

However, traditional HMM-based approaches have notable downsides when working with large-scale GPS datasets:

  1. Significant Costs: Many commercially available map matching APIs charge substantial fees for large-scale usage.
  2. Performance Issues: Traditional map matching algorithms, while accurate, are often not optimized for large-scale computation. They can be prohibitively slow, especially when dealing with extensive GPS data, as the underlying computation struggles to handle the data scale efficiently.

These challenges highlight the need for more efficient and cost-effective solutions capable of handling large-scale GPS datasets without compromising on performance.

RESTful API Map Matching Options

The Mapbox Map Matching API, HERE Maps Route Matching API, and Google Roads API are powerful RESTful APIs for performing map matching. These solutions are particularly effective for small-scale applications. However, for large-scale applications, such as population-level analysis involving millions of trajectories, the costs can become prohibitively high.

For example, as of July 2024, the approximate costs for matching 1 million trips are:

  • Mapbox: $1,600
  • HERE Maps: $4,400
  • Google Maps Platform: $8,000

These prices are based on public pricing pages and do not consider any potential volume-based discounts that may be available.

While these APIs provide robust and accurate map matching capabilities, organizations seeking to perform extensive analyses often must explore more cost-effective alternatives.

Open-Source Map Matching Solutions

Open-source software such as such as Valhalla or GraphHopper can also be used for map matching. However, these solutions are designed for use on a single-machine. If your map matching workload exceeds the capacity that machine, your workload will suffer from extended processing times. Furthermore, you will end up running out of headroom if you are vertically scaling up the ladder of VM sizes.

Meet WherobotsAI Map Matching

WherobotsAI Map Matching is a high performance, low cost, and planetary scale map matching capability for your telematics pipelines.

WherobotsAI provides a scalable map matching feature designed for small to very large scale trajectory datasets. It works seamlessly with other Wherobots capabilities, which means you can implement data cleaning, data transformations, and map matching in one single (serverless) data processing pipeline. We’ll see how it works in the following sections.

How it works

WherobotsAI Map Matching takes a DataFrame containing trajectories and another DataFrame containing road segments, and produces a DataFrame containing map matched results. Here is a walk-through of using WherobotsAI Map Matching to match trajectories in the VED dataset to the OpenStreetMap (OSM) road network.

1. Preparing the Trajectory Data

First, we load the trajectory data. We’ll use the preprocessed VED dataset stored as GeoParquet files for demonstration.

dfPath = sedona.read.format("geoparquet").load("s3://wherobots-benchmark-prod/data/mm/ved/VED_traj/")

The trajectory dataset should contain the following attributes:

  • A unique ID for trips. In this example the ids attribute is the unique ID of each trip.
  • A geometry attribute containing LineStrings, in this case the geometry attribute is for trip data.

The rows in the trajectory DataFrame look like this:

+---+-----+----+--------------------+--------------------+
|ids|VehId|Trip|              coords|            geometry|
+---+-----+----+--------------------+--------------------+
|  0|    8| 706|[{0, 42.277558333...|LINESTRING (-83.6...|
|  1|    8| 707|[{0, 42.277681388...|LINESTRING (-83.6...|
|  2|    8| 708|[{0, 42.261997222...|LINESTRING (-83.7...|
|  3|   10|1558|[{0, 42.277065833...|LINESTRING (-83.7...|
|  4|   10|1561|[{0, 42.286599722...|LINESTRING (-83.7...|
+---+-----+----+--------------------+--------------------+
only showing top 5 rows
2. Preparing the Road Network Data

We’ll use the OpenStreetMap (OSM) data specific to the Ann Arbor, Michigan region to map match our trip data with. Wherobots provides an API for loading road network data from OSM XML files.

from wherobots import matcher
dfEdge = matcher.load_osm("s3://wherobots-examples/data/osm_AnnArbor_large.xml", "[car]")
dfEdge.show(5)

The loaded road network DataFrame looks like this:

+--------------------+----------+--------+----------+-----------+----------+-----------+
|            geometry|       src|     dst|   src_lat|    src_lon|   dst_lat|    dst_lon|
+--------------------+----------+--------+----------+-----------+----------+-----------+
|LINESTRING (-83.7...|  68133325|27254523| 42.238819|-83.7390142|42.2386159|-83.7390153|
|LINESTRING (-83.7...|9405840276|27254523|42.2386058|-83.7388915|42.2386159|-83.7390153|
|LINESTRING (-83.7...|  68133353|27254523|42.2385675|-83.7390856|42.2386159|-83.7390153|
|LINESTRING (-83.7...|2262917109|27254523|42.2384552|-83.7390313|42.2386159|-83.7390153|
|LINESTRING (-83.7...|9979197063|27489080|42.3200426|-83.7272283|42.3200887|-83.7273003|
+--------------------+----------+--------+----------+-----------+----------+-----------+
only showing top 5 rows

Users can also prepare the road network data from any data source using any data processing procedures, as long as the schema of the road network DataFrame conforms to the requirement of the Map Matching API.

3. Run Map Matching

Once the trajectories and road network data is ready, we can run matcher.match to match trajectories to the road network.

dfMmResult = matcher.match(dfEdge, dfPath, "geometry", "geometry")

The dfMmResult contains the trajectories snapped to the roads in matched_points attribute:

+---+--------------------+--------------------+--------------------+
|ids|     observed_points|      matched_points|       matched_nodes|
+---+--------------------+--------------------+--------------------+
|275|LINESTRING (-83.6...|LINESTRING (-83.6...|[62574078, 773611...|
|253|LINESTRING (-83.6...|LINESTRING (-83.6...|[5930199197, 6252...|
| 88|LINESTRING (-83.7...|LINESTRING (-83.7...|[4931645364, 6249...|
|561|LINESTRING (-83.6...|LINESTRING (-83.6...|[29314519, 773612...|
|154|LINESTRING (-83.7...|LINESTRING (-83.7...|[5284529433, 6252...|
+---+--------------------+--------------------+--------------------+
only showing top 5 rows

We can visualize the map matching result using SedonaKepler to see what the matched trajectories look like:

mapAll = SedonaKepler.create_map()
SedonaKepler.add_df(mapAll, dfEdge, name="Road Network")
SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("observed_points AS geometry"), name="Observed Points")
SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("matched_points AS geometry"), name="Matched Points")
mapAll

The following figure shows the map matching results. The red lines are original trajectories, and the green lines are matched trajectories. We can see that the noisy original trajectories are all snapped to the road network.

map matching results example 2

Performance

We used WherobotsAI Map Matching to match 90 million trips across the entire US in just 1.5 hours on the Wherobots Tokyo runtime, which equates to approximately 1 million trips per minute. The average cost of matching 1 million trips is an order of magnitude less costly and more efficient than the options outlined above.

The “optimization magic” behind WherobotsAI Map Matching lies in how Wherobots intelligently and automatically co-partitions trajectory and road network datasets based on the spatial proximity of their elements, ensuring a balanced distribution of work. As a result, the computational load is balanced evenly through this partitioning strategy, and makes map matching with Wherobots highly efficient, scalable, and affordable compared to alternatives.

Try It Out!

You can try out WherobotsAI Map Matching by starting a notebook environment in Wherobots Cloud and running our example notebook within Wherobots Cloud.

notebook_example/python/wherobots-ai/mapmatching_example.ipynb

You can also check out the WherobotsAI Map Matching tutorial and reference documentation for more information!

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

Unlock Satellite Imagery Insights with WherobotsAI Raster Inference

Recently we introduced WherobotsAI Raster Inference to unlock analytics on satellite and aerial imagery using SQL or Python. Raster Inference simplifies extracting insights from satellite and aerial imagery using SQL or Python, and is powered by open-source machine learning models. This feature is currently in preview, and we are expanding it’s capabilities to support more models. Below we’ll dig into the popular computer vision tasks that Raster Inference supports, describe how it works, and how you can use it to run batch inference to find and map electricity infrastructure.

Watch the live demo of these capabilities here.

The Power of Machine Learning with Satellite Imagery

Petabytes of satellite imagery are generated each day all over the world in a dizzying number of sensor types and image resolutions. The applications for satellite imagery and other remote sensing data sources are broad and diverse. For example, satellites with consistent, continuous orbits are ideal for monitoring forest carbon stocks to validate carbon credits or estimating agricultural yields.

However, this data has been inaccessible for most analysts and even seasoned ML practitioners because insight extraction required specialized skills. We’ve done the work to make insight extraction simple and accessible to more people. Raster Inference abstracts the complexity and scales to support planetary-scale imagery datasets, so you don’t need ML expertise to derive insights. In this blog, we explore the key features that make Raster Inference effective for land cover classification, solar farm mapping, and marine infrastructure detection. And, in the near future, you will be able to use Raster Inference with your own models!

Introduction to Popular and Supported Machine Learning Tasks

Raster Inference supports the three most common kinds of computer vision models that are applied to imagery: classification, object detection, and semantic segmentation. Instance segmentation (combines object localization and semantic segmentation) is another common type of model which is not currently supported, but let us know if you need by contacting us and we can add it to the roadmap.

Computer Vision Detection Types
Computer Vision Detection Categories from Lin et al. Microsoft COCO: Common Objects in Context

The figure above illustrates these tasks. Image classification is when an image is assigned one or more text labels. In image (a), the scene is assigned the labels “person”, “sheep”, and “dog”. Image (b) is an example of object localization (or object detection). Object localization creates bounding boxes around objects of interest and assigns labels. In this image, five sheep are localized separately along with one human and one dog. Finally, semantic segmentation is when each pixel is given a category label, as shown in image (c). Here we can see all the pixels belonging to sheep are labeled blue, the dog is labeled red, and the person is labeled teal.

While these examples highlight detection tasks on regular imagery, these computer vision models can be applied to raster formatted imagery. Raster data formats are the most common data formats for satellite and aerial imagery. When objects of interest in raster imagery are localized, their bounding boxes can be georeferenced, which means that each pixel is localized to spatial coordinates, such as latitude and longitude. Therefore, georeferencing is object localization suited for spatial analytics.

https://wherobots.com/wp-content/uploads/2024/06/remotesensing-11-00339-g005.png

The example above shows various applications of object detection for localizing and classifying features in high resolution satellite and aerial imagery. This example comes from DOTA, a 15-class dataset of different objects in RGB and grayscale satellite imagery. Public datasets like DOTA are used to develop and benchmark machine learning models.

Not only are there many publicly available object detection models, but also there are many semantic segmentation models.

Semantic Segmentation
Sourced from “A Scale-Aware Masked Autoencoder for Multi-scale Geospatial Representation Learning”.

Not every machine learning model should be treated equally, and each will have their own tradeoffs. You can see the difference between the ground truth image (human annotated buildings representing the real world) and segmentation results across two models (Scale-MAE and Vanilla MAE). These results are derived from the same image at two different resolutions (referred to as GSD, or Ground Sampling Distance).

  • Scale-MAE is a model developed to handle detection tasks at various resolutions with different sensor inputs. It uses a similar MAE model architecture as the Vanilla MAE, but is trained specifically for detection tasks on overhead imagery that span different resolutions.
  • The Vanilla MAE is not trained to handle varying resolutions in overhead imagery. It’s performance suffers in the top row and especially the bottom row, where resolution is coarser, as seen by the mismatch between Vanilla MAE and the ground truth image where many pixels are incorrectly classified.

Satellite Analytics Before Raster Inference

Without Raster Inference, typically a team who is looking to extract insights from overhead imagery using ML would need to:

  1. Deploy a distributed runtime to scale out workloads such as data loading, preprocessing, and inference.
  2. Develop functionality to operate on raster metadata to easily filter it by location to run inference workloads on specific areas of interest.
  3. Optimize models to run performantly on GPUs, which can involve complex rewrites of the underlying model prediction logic.
  4. Create and manage data preprocessing pipelines to normalize, resize, and collate raster imagery into the correct data type and size required by the model.
  5. Develop the logic to run data loading, preprocessing, and model inference efficiently at scale.

Raster Inference and its SQL and Python APIs abstract this complexity so you and your team can easily perform inference on massive raster datasets.

Raster Inference APIs for SQL and Python

Raster Inference offers APIs in both SQL and Python to run inference tasks. These APIs are designed to be easy to use, even if you’re not a machine learning expert. RS_CLASSIFY can be used for scene classification, RS_BBOXES_DETECT for object detection, and RS_SEGMENT for semantic segmentation. Each function produces tabular results which can be georeferenced either for the scene, object, or segmentation depending on the function. The records can be joined or visualized with other data (geospatial or traditional) to curate enriched datasets and insights. Here are SQL and Python examples for RS_Segment.

RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
df = df_raster_input.withColumn("segment_result", rs_segment(model_id, col("outdb_raster")))

Example: Mapping Electricity Infrastructure

Imagine you want to optimize the location of new EV charging stations, but you want to target locations based on the availability of green energy sources, such as local solar farms. You can use Raster Inference to detect and locate solar farms and cross-reference these locations with internal data or other vector geometries that captures demand for EV charging. This use case will be demonstrated in our upcoming release webinar on July 10th.

Let’s walk through how to use Raster Inference for this use case.

First, we run predictions on rasters to find solar farms. The following code block that calls RS_SEGMENT shows how easy this is.

CREATE OR REPLACE TEMP VIEW segment_fields AS (
    SELECT
        outdb_raster,
        RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
    FROM
    az_high_demand_with_scene
)

The confidence_array column produced from RS_SEGMENT can be assigned the same geospatial coordinates as the raster input and converted to a vector that can be spatially joined and processed with WherobotsDB using RS_SEGMENT_TO_GEOMS. We select a confidence threshold of .65 so that we only georeference high confidence detections.

WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(outdb_raster, confidence_array, array(1), class_map, 0.65) result
        FROM predictions_df
    )
    SELECT result.* FROM t
+----------+--------------------+--------------------+
|     class|avg_confidence_score|            geometry|
+----------+--------------------+--------------------+
|Solar Farm|  0.7205783606825462|MULTIPOLYGON (((-...|
|Solar Farm|  0.7273308333550763|MULTIPOLYGON (((-...|
|Solar Farm|  0.7301468510823231|MULTIPOLYGON (((-...|
|Solar Farm|  0.7180177244988899|MULTIPOLYGON (((-...|
|Solar Farm|   0.728077805771141|MULTIPOLYGON (((-...|
|Solar Farm|     0.7264981572898|MULTIPOLYGON (((-...|
|Solar Farm|  0.7044100126912517|MULTIPOLYGON (((-...|
|Solar Farm|  0.7137283466756343|MULTIPOLYGON (((-...|
+----------+--------------------+--------------------+

This allows us to integrate the vectorized model predictions with other spatial datasets and easily visualize the results with SedonaKepler.

https://wherobots.com/wp-content/uploads/2024/06/solar_farm_detection-1-1024x398.png

Here Raster Inference runs on a 85 GiB dataset with 2,200 raster scenes for Arizona. Using a Sedona (tiny) runtime, Raster Inference completed in 430 seconds, predicting solar farms for all low cloud cover satellite images for the state of Arizona for the month of October. If we scale up our runtime to a San Francisco (small) runtime, the inference speed nearly doubles. In general, average bytes processed per second by Wherobots increases as datasets scale in size because startup costs are amortized over time. Processing speed also increases as runtimes scale in size.

Inference time (seconds) Runtime Size
430 Sedona
246 San Francisco

We use predictions from the output of Raster Inference to derive insights about which zip codes have the most solar farms, as shown below. This statement joins predicted solar farms with zip codes by location, then ranks zip codes by the pre-computed solar farm area within each zip code. We skipped this step for brevity but you can see it and others in the notebook example.

az_solar_zip_codes = sedona.sql("""
SELECT solar_area, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM predictions_polys JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, predictions_polys.geometry)
GROUP BY ZCTA5CE10
ORDER BY solar_area DESC
""")

https://wherobots.com/wp-content/uploads/2024/06/final_analysis.png

These predictions are made possible by SATLAS, a family of machine learning models released with Apache 2.0 licensing from Allen AI. The solar model demonstrated above was derived from the SATLAS foundational model. This foundational model can be used as a building block to create models to address specific detection challenges like solar farm detection. Additionally, there are many other open source machine learning models available for deriving insights from satellite imagery, many of which are provided by the TorchGeo project. We are just beginning to explore what these models can achieve for planetary-scale monitoring.

If you have a specific model you would like to see made available, please contact us to let us know.

For detailed instructions on using Raster Inference, please refer to our example Jupyter notebooks in the documentation.

https://wherobots.com/wp-content/uploads/2024/06/Screenshot_2024-06-08_at_2.11.07_PM-1024x683.png

Here are some links to get you started:
https://docs.wherobots.com/latest/tutorials/wherobotsai/wherobots-inference/segmentation/

https://docs.wherobots.com/latest/api/wherobots-inference/pythondoc/inference/sql_functions/

Getting Started

Getting started with WherobotsAI Raster Inference is easy. We’ve provided three models in Wherobots Cloud that can be used with our GPU optimized runtimes. Sign up for an account on Wherobots Cloud, send us a note to access the professional tier, start a GPU runtime, and you can run our example Jupyter notebooks to analyze satellite imagery in SQL or Python.

Stay tuned for updates on improvements to Raster Inference that will make it possible to run more models, including your own custom models. We’re excited to hear what models you’d like us to support, or the integrations you need to make running your own models even easier with Raster Inference. We can’t wait for your feedback and to see what you’ll create!

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

New Wherobots Cloud Features, How Overture Maps Uses Apache Sedona, Aircraft Data, & Spatial Lakehouses

Welcome to This Month In Wherobots the monthly developer newsletter for the Wherobots & Apache Sedona community! In this edition we have a look at the latest Wherobots Cloud release, how the Overture Maps Foundation uses Apache Sedona to generate their data releases, processing a billion aircraft observations, building spatial data lakehouses with Iceberg Havasu, the new Apache Sedona 1.6.0 release, and more!

Introducing WherobotsAI For Planetary Inference And Capabilities That Modernize Spatial Intelligence At Scale

Wherobots announced significant new features in Wherobots Cloud to enable machine learning inference on satellite imagery via SQL, new Python and Java database drivers for interacting with WherobotsDB in your own analytics applications or data orchestration tooling, and a scalable vector tiles generator. These new enhancements are available now in Wherobots Cloud.

Read The Blog Post or Register For The Webinar

Making Overture Maps Data More Efficient With GeoParquet And Apache Sedona

Querying Overture Maps GeoParquet data using Apache Sedona

The Overture Maps Foundation publishes an open comprehensive global map dataset with layers for transportation, places, 3D buildings, and administrative boundaries. This data comes from multiple sources and is published in cloud-native GeoParquet format made publicly available for download in cloud object storage. In order to wrangle such a large planetary-scale dataset the Overture team uses Apache Sedona to prepare, process, and generate partitioned GeoParquet files. This blog post dives into the benefits of GeoParquet, how Overture uses Sedona to generate GeoParquet (including a dual Geohash partitioning and sorting method), and how to query and analyze the Overture Maps dataset using Wherobots Cloud.

Read the article: Making Overture Maps Data More Efficient With GeoParquet And Apache Sedona

Featured Community Member: Feng Jiang

June featured community member

Our featured Apache Sedona and Wherobots Community Member this month is Feng Jiang, a Senior Software Engineer at Microsoft where he works with map and geospatial data at scale. Through his involvement with the Overture Maps Foundation he also helps maintain and publish the public Overture Maps dataset. In the blog post "Making Overture Maps Data More Efficient With GeoParquet And Apache Sedona" he shared some insights gained from working with Apache Sedona at Overture in the pipeline used to create and generate GeoParquet data of planetary-scale map data. Thanks for your contributions and being a part of the Apache Sedona community!

Processing A Billion Aircraft Observations With Apache Sedona In Wherobots Cloud

Impacted flight segments

An important factor to consider when analyzing aircraft data is the potential impact of weather and especially severe weather events on aircraft flights. This tutorial uses public ADS-B aircraft trace data combined with weather data to identify which flights have the highest potential to be impacted by severe weather events. We also see how to combine real-time Doppler radar raster data as well as explore the performance of working with a billion row dataset for spatial operations like point-in-polygon searches and spatial joins.

Read The Tutorial: Processing A Billion Aircraft Observations With Apache Sedona In Wherobots Cloud

Training Series: Large-Scale Geospatial Analytics With Graphs And The PyData Ecosystem

Large-Scale Geospatial Analytics With Graphs And The PyData Ecosystem

Choosing the right tool for the job is an important aspect of data science, and equally important is understanding how the tools fit together and can be used alongside each other. This hands-on workshop shows how to leverage the scale of Apache Sedona with Wherobots Cloud for geospatial data processing, alongside common Python tooling like Geopandas, and how to add graph analytics using Neo4j to our analysis toolkit. Using a dataset of species observations we build a species interaction graph to find which species share habitat overlap, a common workflow for conservation use cases.

Watch The Workshop Recording: Large Scale Geospatial Analytics With Graphs And The PyData Ecosystem

Apache Sedona 1.6 Release

Apache Sedona Ecosystem

Version 1.6.0 of Apache Sedona is now available! This version includes support for Shapely 2.0 and GeoPandas 0.11.1+, enhanced support for geography data, new vector and raster functions, and tighter integration Python raster data workflows with support for Rasterio and NumPy User Defined Functions (UDFs). You can learn more about this release in the release notes.

Read The Apache Sedona 1.6 Release Notes

Building Spatial Data Lakehouses With Iceberg Havasu

Iceberg Havasu: A Spatial Data Lakehouse Format

This talk from Subsurface 2024 introduces the Havasu spatial table format, an extension of Apache Iceberg used to build spatial data lakehouses. We learn about the motivation for adding spatial functionality to Iceberg, how Havasu Iceberg enables efficient spatial queries for both vector and raster data, and how to use familiar SQL table interface when building large-scale geospatial analytics applications.

Watch The Recording: Building Spatial Data Lakehouses With Iceberg Havasu

Upcoming Events

Want to receive this monthly update in your inbox? Sign up for the This Month In Wherobots Newsletter:


Introducing WherobotsAI for planetary inference, and capabilities that modernize spatial intelligence at scale

We are excited to announce a preview of WherobotsAI, our new suite of AI and ML powered capabilities that unlock spatial intelligence in satellite imagery and GPS location data. Additionally, we are bringing the high-performance of WherobotsDB to your favorite data applications with a Spatial SQL API that integrates WherobotsDB with more interfaces including Apache Airflow for Spatial ETL. Finally, we’re introducing the most scalable vector tile generator on earth to make it easier for teams to produce engaging and interactive map applications. All of these new features are capable of operating on planetary-scale data.

Watch the walkthrough of this release here.

Wherobots Mission and Vision

Before we dive into this release, we think it’s important to understand how these capabilities fit into our mission, our product principles, and vision for the Spatial Intelligence Cloud so you can see where we are headed.

Our Mission
These new capabilities are core to Wherobots’ mission, which is to unlock spatial intelligence of earth, society, and business, at a planetary scale. We will do this by making it extremely easy to utilize data and AI technology purpose-built for creating spatial intelligence that’s cloud-native and compatible with modern open data architectures.

Our Product Principles

  • We’re building the spatial intelligence platform for modern organizations. Every organization with a mission directly linked to the performance of tangible assets, goods and services, or data products about what’s happening in the physical world, will need a spatial intelligence platform to be competitive, sustainable, and climate adaptive.
  • It delivers intelligence for the greater good. Teams and their organizations want to analyze their worlds to create a net positive impact for business, society, and the earth.
  • It’s purpose-built yet simple. Spatial intelligence won’t scale through in-house ‘spatial experts’, or through general purpose architectures that are not optimized for spatial workloads or development experiences.
  • It’s efficient at any scale. Maximal performance, scale, and cost efficiency can only be achieved through a cloud-native, serverless solution.
  • It creates intelligence with AI. Every organization will need AI alongside modern analytics to create spatial intelligence.
  • It’s open by default. Pace of innovation depends on choice. Organizations that adopt cloud-native, open source compatible, and modern open data architectures will innovate faster because they have more choices in the solutions they can use.

Our Vision
We exist because creating spatial intelligence at-scale is hard. Our contributions to Apache Sedona, leadership in the open geospatial domain, and investments in Wherobots Cloud have, and will make it easier. Users of Apache Sedona, Wherobots customers, and ultimately any AI application will be enabled to support better decisions about our physical and virtual worlds. They will be able to create solutions to improve these worlds that were otherwise infeasible or too costly to build. And the solutions developed will have a positive impact on society, business, and earth — at a planetary scale.

Introducing WherobotsAI

There are petabytes of satellite or aerial imagery produced every day. Yet for most analysts, scientists, and developers, these datasets are analytically inaccessible outside of the naked eye. As a result most organizations still rely on humans and their eyes, to analyze satellite or other forms of aerial imagery. Wherobots can already perform analytics of overhead imagery (also known as raster data) and geospatial objects (known as vector data) simultaneously at scale. But organizations also want to use modern AI and ML technologies to streamline and scale otherwise visual, single threaded tasks like object detection, classification, and segmentation from overhead imagery.

Like satellite imagery that is generally hard to analyze, businesses also find it hard to analyze GPS data in their applications because it’s too noisy; points don’t always correspond to the actual path taken. Teams need an easy solution for snapping noisy GPS data to road or other segment types, at any scale.

Today we are announcing WherobotsAI which offers fully managed AI and machine learning capabilities that accelerate the development of spatial insights, for anyone familiar with SQL or Python. WherobotsAI capabilities include:

[new] Raster Inference (preview): A first of its kind, Raster Inference unlocks the analytical potential of satellite or aerial imagery at a planetary scale, by integrating AI models with WherobotsDB to make it extremely easy to detect, classify, and segment features of interest in satellite and aerial images. You can see how easy it is to detect and georeference solar farms here, with just a few lines of SQL:

SELECT
  outdb_raster,
  RS_SEGMENT(‘solar-satlas-sentinel2’, outdb_raster) AS solar_farm_result
FROM df_raster_input

These georeferenced predictions can be queried with WherobotsDB and can be interactively explored in a Wherobots notebook. Below is an example of detection of solar panels in SedonaKepler.

AI Inference Solar Farm

The models and AI infrastructure powering Raster Inference are fully managed, which means there’s nothing to set up or configure. Today, you can use Raster Inference to detect, segment, and classify solar farms, land cover, and marine infrastructure from terabyte-scale Sentinel-2 true color and multispectral imagery datasets in under half an hour, on our GPU runtimes available in the Wherobots Professional Edition. Soon we will be making the inference metadata for the models public, so if your own models meet this standard, they are supported by Raster Inference.

These models and datasets are just the starting point for WherobotsAI. We are looking forward to hearing from you to help us define the roadmap for what we should build support for next.

Map Matching: If you need to analyze trips at scale, but struggle to wrangle noisy GPS data, Map Matching is capable of turning billions of noisy GPS pings into signal, by snapping shared points to road or other vector segments. Teams are using Map Matching to process hundreds of millions of vehicle trips per hour. This speed surpasses any current commercial solutions, all for a cost of just a few hundred dollars.

Here’s an example of what WherobotsAI Map Matching does to improve the quality of your trip segments.

  • Red and yellow line segments were created from raw, noisy GPS data.
  • Green represents Map Matched segments.

map matching algorithm

Visit the user documentation to learn more and get started with WherobotsAI.

A Spatial SQL API for WherobotsDB

WherobotsDB, our serverless, highly efficient compute engine compatible with Apache Sedona is up to 60x more performant for spatial joins than popular general purpose big data engines and warehouses, and up to 20x faster than Apache Sedona on its own. It will remain the most performant, earth-friendly solution for your spatial workloads at any scale.

Until today, teams had two options for harnessing WherobotsDB: they could write and run queries in Wherobots managed notebooks, or run spatial ETL pipelines using the Wherobots jobs interface.

Today, we’re enabling you to bring the utility of WherobotsDB to more interfaces with the new Spatial SQL API. Using this API, teams can remotely execute Spatial SQL queries using a remote SQL editor, build first-party applications using our client SDKs in Python (WherobotsDB API driver) and Java (Wherobots JDBC driver), or orchestrate spatial ETL pipelines using a Wherobots Apache Airflow provider.

Run spatial queries with popular SQL IDEs

The following is an example of how to integrate Harlequin, a popular SQL IDE with WherobotsDB. You’ll need a Wherobots API key to get started with Harlequin (or any remote client). API keys allow you to authenticate with Wherobots Cloud for programmatic access to Wherobots APIs and services. API keys can be created following a few steps in our user documentation.

We will query WherobotsDB using Harlequin in the Airflow example later in this blog.

$ pip install harlequin-wherobots
$ harlequin -a wherobots --api-key $(< api.key)

harlequin api key connection

You can find more information on how to use Harlequin in its documentation, and on the WherobotsDB adapter on its GitHub repository.

The Wherobots Python driver enables integration with many other tools as well. Here’s an example of using the Wherobots Python driver in the QGIS Python console to fetch points of interest from the Overture Maps dataset using Spatial SQL API.

from wherobots.db import connect
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime
import geopandas 
from shapely import wkt

with connect(
        token=os.environ.get("WBC_TOKEN"),
        runtime=Runtime.SEDONA,
        region=Region.AWS_US_WEST_2,
        host="api.cloud.wherobots.com"
) as conn:
    curr = conn.cursor()
    curr.execute("""
    SELECT names.common[0].value AS name, categories.main AS category, geometry 
    FROM wherobots_open_data.overture.places_place 
    WHERE ST_DistanceSphere(ST_GeomFromWKT("POINT (-122.46552 37.77196)"), geometry) < 10000
    AND categories.main = "hiking_trail"
    """)
    results = curr.fetchall()
    print(results)

results["geometry"] = results.geometry.apply(wkt.loads)
gdf = geopandas.GeoDataFrame(results, crs="EPSG:4326",geometry="geometry")

def add_geodataframe_to_layer(geodataframe, layer_name):
    # Create a new memory layer
    layer = QgsVectorLayer(geodataframe.to_json(), layer_name, "ogr")

    # Add the layer to the QGIS project
    QgsProject.instance().addMapLayer(layer)

add_geodataframe_to_layer(gdf, "POI Layer")

Using the Wherobots Python driver with QGIS

Visit the Wherobots user documentation to get started with the Spatial SQL API, or see our latest blog post that goes deeper into how to use our database drivers with the Spatial SQL API.

Automating Spatial ETL workflows with the Apache Airflow provider for Wherobots

ETL (extract, transform, load) workflows are oftentimes required to prepare spatial data for interactive analytics, or to refresh datasets automatically as new data arrives. Apache Airflow is a powerful and popular open source orchestrator of data workflows. With the Wherobots Apache Airflow provider, you can now use Apache Airflow to convert your spatial SQL queries into automated workflows running on Wherobots Cloud.

Here’s an example of the Wherobots Airflow provider in use. In this example we identify the top 100 buildings in the state of New York with the most places (facilities, services, business, etc.) registered within them using the Overture Maps dataset, and we’ll eventually auto-refresh the result daily. The initial view can be generated with the following SQL query:

CREATE TABLE wherobots.test_db.top_100_hot_buildings_daily AS
SELECT
  buildings.id AS building,
  first(buildings.names),
  count(places.geometry) AS places_count,
  '2023-07-24' AS ts
FROM wherobots_open_data.overture.places_place places
JOIN wherobots_open_data.overture.buildings_building buildings
  ON ST_CONTAINS(buildings.geometry, places.geometry)
WHERE places.updatetime >= '2023-07-24'
  AND places.updatetime < '2023-07-25'
  AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), places.geometry)
  AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), buildings.geometry)
GROUP BY building
ORDER BY places_count DESC
LIMIT 100
  • A place in Overture is defined as real-world facilities, services, businesses or amenities.
  • We used an arbitrary date of 2023-07-24.
  • New York is defined by a simple bounding box polygon (79.762152, 40.496103, -71.856214, 45.01585) (we could alternatively join with its appropriate administrative boundary polygon)
  • We use two WHERE clauses on places.updatetime to filter one day’s worth of data.
  • The query creates a new table wherobots.test_db.top_100_hot_buildings_daily to store the query result. Note that it will not directly return any records because we are loading directly into a table.

Now, lets use Harlequin as described earlier to inspect the outcome of creating this table with the above query:

SELECT * FROM wherobots.test_db.top_100_hot_buildings_daily

Harlequin query test 2

Apache Airflow and the Airflow Provider for Wherobots allow you to schedule and execute this query each day, injecting the appropriate date filters into your templatized query.

  • In your Apache Airflow instance, install the airflow-providers-wherobots library. You can either execute pip install airflow-providers-wherobots, or add the library to the dependency list of your Apache Airflow runtime.
  • Create a new “generic” connection for Wherobots called wherobots_default, using api.cloud.wherobots.com as the “Host” and your Wherobots API key as the “Password”.

The next step is to create an Airflow DAG. The Wherobots Provider exposes the WherobotsSqlOperator for executing SQL queries. Update the hardcoded “2023-07-24” in your query into the Airflow template macros {ds} and {next_ds}, which will be rendered as the DAG schedule date on the fly:

import datetime

from airflow import DAG
from airflow_providers_wherobots.operators.sql import WherobotsSqlOperator

with DAG(
    dag_id="example_wherobots_sql_dag",
    start_date=datetime.datetime.strptime("2023-07-24", "%Y-%m-%d"),
    schedule="@daily",
    catchup=True,
    max_active_runs=1,
):
    operator = WherobotsSqlOperator(
        task_id="execute_query",
        wait_for_downstream=True,
        sql="""
        INSERT INTO wherobots.test_db.top_100_hot_buildings_daily
        SELECT
          buildings.id AS building,
          first(buildings.names),
          count(places.geometry) AS places_count,
          '{{ ds }}' AS ts
        FROM wherobots_open_data.overture.places_place places
        JOIN wherobots_open_data.overture.buildings_building buildings
          ON ST_CONTAINS(buildings.geometry, places.geometry)
        WHERE places.updatetime >= '{{ ds }}'
          AND places.updatetime < '{{ next_ds }}'
          AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), places.geometry)
          AND ST_CONTAINS(ST_PolygonFromEnvelope(-79.762152, 40.496103, -71.856214, 45.01585), buildings.geometry)
        GROUP BY building
        ORDER BY places_count DESC
        LIMIT 100
        """,
        return_last=False,
    )

You can visualize the status of the and log of the DAG’s execution in the Apache Airflow UI. As shown below, the operator prints out the exact query rendered and executed when you run your DAG.

apache airflow spatial sql api
Please visit the Wherobots user documentation for more details on how to set up your Apache Airflow instance with the Wherobots Provider.

Generate Vector Tiles — formatted as PMTiles — at Global Scale

Vector tiles are high resolution representations of features optimized for visualization, computed offline and displayed in map applications. This decouples dataset preparation from client side rendering driven by zooming and panning. By decoupling dataset preparation from the interactive experience, map developers use vector tiles to significantly improve the utility, clarity, and responsiveness of feature rich interactive map applications.

Traditional vector tiles generators like Tippecanoe are limited to the processing capability of a single VM and require the use of limited formats. These solutions are great for small-scale tile generation workloads when data is already in the right file format. But if you’re like the teams we’ve worked with, you may start small and need to scale past the limits of a single VM, or have a variety of file formats. You just want to generate vector tiles with the data you have, at any scale without having to worry about format conversion steps, configuring infrastructure, partitioning your workload around the capability of a VM, or waiting for workloads to complete.

Vector Tile Generation, or VTiles for WherobotsDB generates vector tiles in PMTiles format across common data lake formats, incredibly quickly and at a planetary scale, so you can start small and know you have the capability to scale without having to look for another solution. VTiles is incredibly fast because serverless computation is parallelized, and the WherobotsDB engine is optimized for vector tile generation. This means your development teams can spend less time building map applications that matter to your customers.

Using a Tokyo runtime, we generated vector tiles with VTiles for all buildings in the Overture dataset, from zoom levels 4-15 across the entire planet, in 23 minutes. That’s fast and efficient for a planetary scale operation. You can run the tile-generation-example notebook in the Wherobots Pro tier to experience the speed and simplicity of Vtiles yourself. Here’s what this looks like:

Visit our user documentation to start generating vector tiles at-scale.

Try Wherobots now

We look forward to hearing how you put these new capabilities to work, along with your feedback to increase the usefulness of the Wherobots Cloud platform. You can try these new features today by creating a Wherobots Cloud account. WherobotsAI is a professional tier feature.

Please reach out on LinkedIn or connect to us on email at info@wherobots.com

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:


The Spatial SQL API brings the performance of WherobotsDB to your favorite data applications

Since its launch last fall, Wherobots has raised the bar for cloud-native geospatial data analytics, offering the first and only platform for working with vector and raster geospatial data together at a planetary scale. Wherobots delivers a significant breadth of geospatial analytics capabilities, built around a cloud-native data lakehouse architecture and query engine that delivers up to 60x better performance than incumbent solutions. Accessible through the powerful notebook experience data scientists and data engineers know and love, Wherobots Cloud is the most comprehensive, approachable, and fully-managed serverless offering for enabling spatial intelligence at scale.

Today, we’re announcing the Wherobots Spatial SQL API, powered by Apache Sedona, to bring the performance of WherobotsDB to your favorite data applications. This opens the door to a world of direct-SQL integrations with Wherobots Cloud, bringing a serverless cloud engine that’s optimized for spatial workloads at any scale into your spatial ETL pipelines and applications, and taking your users and engineers closer to your data and spatial insights.

Register for our release webinar on July 10th here: https://bit.ly/3yFlFYk

Developers love Wherobots because compute is abstracted and managed by Wherobots Cloud. Because it can run at a planetary scale, Wherobots streamlines development and reduces time to insight. It runs on a data lake architecture, so data doesn’t need to be copied into a proprietary storage system, and integrates into familiar development tools and interfaces for exploratory analytics and orchestrating production spatial ETL pipelines.

Utilize Apache Airflow or SQL IDEs with WherobotsDB via the Spatial SQL API

Wherobots Cloud and the Wherobots Spatial SQL API are powered by WherobotsDB, with Apache Sedona at its core: a distributed computation engine that can horizontally scale to handle computation and analytics on any dataset. Wherobots Cloud automatically manages the infrastructure and compute resources of WherobotsDB to serve your use case based on how much computation power you need.

Behind the scenes, your Wherobots Cloud “runtime” defines the amount of compute resources allocated and the configuration of the software environment that executes your workload (in particular for AI/ML use cases, or if your ETL or analytics workflow depends on 1st or 3rd party libraries).

Our always-free Community Edition gives access to a modest “Sedona” runtime for working with small-scale datasets. Our Professional Edition unlocks access to much larger runtimes, up to our “Tokyo” runtime capable of working on planetary-scale datasets, and GPU-accelerated options for your WherobotsAI workloads.

With the release of the Wherobots Spatial SQL API and its client SDKs, you can bring WherobotsDB, the ease-of-use, and the expressiveness of SQL to your Apache Airflow spatial ETL pipelines, your applications, and soon to tools like Tableau, Superset, and other 3rd party systems and applications that support JDBC.

Our customers love applying the performance and scalability of WherobotsDB to their data preparation workflows and their compute-intensive data processing applications.

Use cases include

  • Preparation of nationwide and planetary-scale datasets for their users and customers
  • Processing hundreds of millions of mobility data records every day
  • Creating and analyzing spatial datasets in support of their real estate strategy and decision-making.

Now customers have the option to integrate new tools with Wherobots for orchestration and development of spatial insights using the Spatial SQL API.

How to get started with the Spatial SQL API

By establishing a connection to the Wherobots Spatial SQL API, a SQL session is started backed by your selected WherobotsDB runtime (or a “Sedona” by default but you can specify larger runtimes if you need more horsepower). Queries submitted through this connection are securely executed against your runtime, with compute fully managed by Wherobots.

We provide client SDKs in Java and in Python to easily connect and interact with WherobotsDB through the Spatial SQL API, as well as an Airflow Provider to build your spatial ETL DAGs; all of which are open-source and available on package registries, as well as on Wherobots’ GitHub page.

Using the Wherobots SQL Driver in Python

Wherobots provides an open-source Python library that exposes a DB-API 2.0 compatible interface for connecting to WherobotsDB. To build a Python application around the Wherobots DB-API driver, add the wherobots-python-dbapi library to your project’s dependencies:

$ poetry add wherobots-python-dbapi

Or directly install the package on your system with pip:

$ pip install wherobots-python-dbapi

From your Python application, establish a connection with wherobots.db.connect() and use cursors to execute your SQL queries and use their results:

import logging

from wherobots.db import connect
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime

# Optionally, setup logging to get information about the driver's
# activity.
logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format="%(asctime)s.%(msecs)03d %(levelname)s %(name)20s: %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

# Get your API key, or securely read it from a local file.
api_key = '...'

with connect(
    host="api.cloud.wherobots.com",
    api_key=get_secret(),
  runtime=Runtime.SEDONA,
  region=Region.AWS_US_WEST_2) as conn:
        cur = conn.cursor()
        sql = """
          SELECT
              id,
              names['primary'] AS name,
              geometry,
              population
          FROM
              wherobots_open_data.overture_2024_02_15.admins_locality
          WHERE localityType = 'country'
          SORT BY population DESC
          LIMIT 10
      """
        cur.execute(sql)
        results = cur.fetchall()
      results.show()

For more information and future releases, see https://github.com/wherobots/wherobots-python-dbapi-driver on GitHub.

Using the Apache Airflow provider

Wherobots provides an open-source provider for Apache Airflow, defining an Airflow operator for executing SQL queries directly on WherobotsDB. With this new capability, you can integrate your spatial analytics queries, data preparation or data processing steps into new or existing Airflow workflow DAGs.

To build or extend your Airflow DAG using the WherobotsSqlOperator , add the airflow-providers-wherobots dependency to your project:

$ poetry add airflow-providers-wherobots

Define your connection to Wherobots; by default the Wherobots operators use the wherobots_default connection ID:

$ airflow connections add "wherobots_default" \
    --conn-type "wherobots" \
    --conn-host "api.cloud.wherobots.com" \
    --conn-password "$(< api.key)"

Instantiate the WherobotsSqlOperator and with your choice of runtime and your SQL query, and integrate it into your Airflow DAG definition:

from wherobots.db.runtime import Runtime
import airflow_providers_wherobots.operators.sql.WherobotsSqlOperator

...

select = WherobotsSqlOperator(
  runtime=Runtime.SEDONA,
  sql="""
          SELECT
              id,
              names['primary'] AS name,
              geometry,
              population
          FROM
              wherobots_open_data.overture_2024_02_15.admins_locality
          WHERE localityType = 'country'
          SORT BY population DESC
          LIMIT 10
      """
)
# select.execute() or integrate into your Airflow DAG definition

apache airflow spatial sql api
For more information and future releases, see https://github.com/wherobots/airflow-providers-wherobots on GitHub.

Using the Wherobots SQL Driver in Java

Wherobots provides an open-source Java library that implements a JDBC (Type 4) driver for connecting to WherobotsDB. To start building Java applications around the Wherobots JDBC driver, add the following line to your build.gradle file’s dependency section:

implementation "com.wherobots:wherobots-jdbc-driver"

In your application, you only need to work with Java’s JDBC APIs from the java.sql package:

import com.wherobots.db.Region;
import com.wherobots.db.Runtime;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

// Get your API key, or securely read it from a local file.
String apiKey = "...";

Properties props = new Properties();
props.setProperty("apiKey", apiKey);
props.setProperty("runtime", Runtime.SEDONA);
props.setProperty("region", Region.AWS_US_WEST_2);

try (Connection conn = DriverManager.getConnection("jdbc:wherobots://api.cloud.wherobots.com", props)) {
    String sql = """
        SELECT
            id,
            names['primary'] AS name,
            geometry,
            population
        FROM
            wherobots_open_data.overture_2024_02_15.admins_locality
        WHERE localityType = 'country'
        SORT BY population DESC
        LIMIT 10
    """;
  Statement stmt = conn.createStatement();
  try (ResultSet rs = stmt.executeQuery(sql)) {
    while (rs.next()) {
      System.out.printf("%s: %s %f %s\n",
        rs.getString("id"),
        rs.getString("name"),
        rs.getDouble("population"),
        rs.getString("geometry"));
    }
  }
}

For more information and future releases, see https://github.com/wherobots/wherobots-jdbc-driver on GitHub.

Conclusion

The Wherobots Spatial SQL API takes Wherobots’ vision of hassle-free, scalable geospatial data analytics & AI one step further by making it the easiest way to run your Spatial SQL queries in the cloud. Paired with Wherobots and Apache Sedona’s comprehensive support for working with all geospatial data at any scale and in any format, and with Wherobots AI’s inference features available directly from SQL, the Wherobots Spatial SQL API is also the most flexible and the most capable platform for getting the most out of your data.

Wherobots vision

We exist because creating spatial intelligence at-scale is hard. Our contributions to Apache Sedona, leadership in the open geospatial domain, and investments in Wherobots Cloud have, and will make it easier. Users of Apache Sedona, Wherobots customers, and ultimately any AI application will be enabled to support better decisions about our physical and virtual worlds. They will be able to create solutions to improve these worlds that were otherwise infeasible or too costly to build. And the solutions developed will have a positive impact on society, business, and earth — at a planetary scale.

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:


Raster Data Analysis, Processing Petabytes of Agronomic Data, Overview of Sedona 1.5, and Unlocking The Spatial Frontier – This Month In Wherobots

Raster Data Analysis, Processing Petabytes of Agronomic Data, Overview of Sedona 1.5, and Unlocking The Spatial Frontier – This Month In Wherobots

Welcome to This Month In Wherobots, the monthly newsletter for data practitioners in the Apache Sedona and Wherobots community. This month we’re exploring raster data analysis with Spatial SQL, processing petabytes of agronomic data with Apache Sedona, a deep dive on new features added in the 1.5 release series, and an overview of working with files in Wherobots Cloud.

Raster Data Analysis With Spatial SQL & Apache Sedona

One of the strengths of Apache Sedona and Wherobots Cloud is the ability to work with large scale vector and raster geospatial data together using Spatial SQL. This post (and video) takes a look at how to get started working with raster data in Sedona using Spatial SQL and some of the use cases for raster data analysis including vector / raster join operations, zonal statistics, and using raster map algebra.

Read The Article: Raster Data Analysis With Spatial SQL & Apache Sedona

Featured Community Member: Luiz Santana

This month’s Wherobots & Apache Sedona featured community member is Luiz Santana. Luiz is Co-Founder and CTO of Leaf Agriculture. He has extensive experience as a former data architect and developer. Luiz has a PhD in Computer Science from Universidade Federal de Santa Catarina and spent time researching data processing and integration in highly scalable environments. Leaf Agriculture is building the unified API for food and agriculture by leveraging large-scale agronomic data. Luiz has given several presentations at conferences such as Apache Sedona: How To Process Petabytes of agrnomic data with Spark, and Perspectives on the use of data in Agriculture, which covers how Leaf uses Apache Sedona to analyze large-scale agricultural data and how Sedona fits into their stack alongside other technologies. Thank you Luiz for your work with the Apache Sedona and Wherobots community and sharing your knowledge and experience!

Apache Sedona: How To Process Petabytes of Agronomic Data With Spark

In this presentation from The Developer’s Conference Luiz Santana shares the experience of using Apache Sedona at Leaf Agriculture to process petabytes of agronomic data from satellites, agricultural machines, drones and other sensors. He discusses how Leaf uses Sedona for tasks such as geometry intersections, geographic searches, and polygon transformations with high performance and speed. Luiz also presented Perspectives On The Use of Data in Agriculture which covers some of the data challenges that Leaf handles and an overview of the technologies used to address these challenges, including Apache Sedona.

See The Slides From The Presentation

Working With Files – Getting Started With Wherobots Cloud

https://wherobots.com/wp-content/uploads/2024/02/Screenshot_2024-02-16_at_9.44.20_AM-1024x842.png

This post takes a look at loading and working with our own data in Wherobots Cloud as well as creating and saving data as the result of our analysis, such as the end result of a data pipeline. It covers importing files in various formats including CSV, GeoJSON, Shapefile, and GeoTIFF in Wherobots Cloud, working with AWS S3 cloud object storage, and creating GeoParquet files using Apache Sedona.

Read The Post: Working With Files – Getting Started With Wherobots Cloud

Introducing Sedona 1.5: Making Sedona the most comprehensive & scalable spatial data processing and ETL engine for both raster and vector data

https://wherobots.com/wp-content/uploads/2024/02/Screenshot-2024-02-08-at-2.13.52-PM.png

The 1.5 series of Apache Sedona represents a leap forward in geospatial processing that adds essential features and enhancements to make Sedona a comprehensive, all-in-one cluster computing engine for geospatial vector and raster data analysis. This post covers XYZM coordinates and SRID, vector and raster joins, raster data manipulation, visualization with SedonaKepler and SedonaPyDeck, GeoParquet reading and writing, H3 hexagons, and new cluster compute engine support.

Read The Post: Introducing Sedona 1.5 – Making Sedona The Most Comprehensive & Scalable Spatial Data Processing and ETL Engine For Both Raster and Vector Data

Unlocking the Spatial Frontier: The Evolution and Potential of spatial technology in Apple Vision Pro and Augmented Reality Apps

https://wherobots.com/wp-content/uploads/2024/02/VisionPro.001.png

Apple adopted the term “spatial computing” when announcing the Apple Vision Pro to describe its new augmented reality platform. This post from Wherobots CEO Mo Sarwat examines spatial computing in the context of augmented reality experiences to explore spatial object localization and presentation and the role of spatial query processing and spatial data analytics in Apple Vision Pro.

Read The Post: Unlocking The Spatial Frontier

Upcoming Events

SedonaSnow: Apache Sedona On Snowflake, Accelerating Your GIS Pipeline, Exploring Global Fishing Watch Data With GeoParquet, and Apache Sedona 1.5.1 Release

Welcome to This Month In Wherobots the monthly developer newsletter for the Wherobots & Apache Sedona community! In this edition we cover SedonaSnow: Apache Sedona on Snowflake, accelerating your GIS pipeline with Apache Sedona, exploring Global Fishing Watch public data with SedonaDB and GeoParquet, and a look at new features and updates in the 1.5.1 release of Apache Sedona.

Apache Sedona Now Available In Snowflake Marketplace: SedonaSnow

SedonaSnow

Apache Sedona, the scalable open-source geospatial compute engine is now available on Snowflake via the Snowflake Marketplace or via manual installation. The SedonaSnow plugin brings Apache Sedona’s Spatial SQL functionality to Snowflake via 130+ Sedona SQL "ST" SQL functions that can be used alongside Snowflake SQL.

Read More About Using SedonaSnow In Snowflake In This Tutorial

Featured Community Members: Alihan Zihna & Fernando Ayuso Palacios

This month’s Wherobots & Apache Sedona featured community members are Alihan Zihna, Lead Data Scientist at CKDelta and Fernando Palacios, Director of Data Science & Data Engineering also at CKDelta. Alihan and Fernando presented "GIS Pipeline Acceleration With Apache Sedona" at the Data + AI Summit where they share how they were able to improve the performance and innovation of their geospatial analysis pipelines, going from a pipeline that took 48 hours to complete down to 10 minutes using Apache Sedona. Thank you Fernando and Alihan for being a part of the community and sharing your work!

GIS Pipeline Acceleration With Apache Sedona

GIS pipeline accelerations with Apache Sedona

In this presentation from Data + AI Summit, Fernando and Alihan discuss some of the various usecases for working with large-scale geospatial data at conglomerate CKDelta, part of the Hutchinson Group which operates ports, utility networks, retail stores and mobile telecom networks with hundreds of millions of users across dozens of countries. They discuss how geospatial analytics at scale is important for identifying water leakage in their utility network, understanding customer satisfaction, identifying sites for electric vehicle charging station installation, and forecasting the supply and demand of energy. They provide a technical overview of Apache Sedona and share the results of improving and extending their geospatial analytics pipelines including one process that reduced running time from 48 hours to 10 minutes using Apache Sedona.

Watch the recording of "GIS Pipeline Acceleration With Apache Sedona"

The Wherobots Notebook Environment – Getting Started With Wherobots Cloud & SedonaDB Part 2

Wherobots Initial Notebook

In Part 2 of our Getting Started With Wherobots Cloud & SedonaDB series we dive into the Wherobots Notebook Environment including how to configure and start notebook runtimes, an overview of the sample notebooks included in Wherobots Cloud, and how to use version control like git with notebooks. If you missed it check out Part 1: An Overview of Wherobots Cloud or sign up for a free Wherobots Cloud account to get started directly.

Read More About The Wherobots Notebooks Environment

Exploring Global Fishing Watch Public Data With SedonaDB & GeoParquet

Matched vs unmatched vessels

This post is a hands-on look at offshore ocean infrastructure and industrial vessel activity with SedonaDB using data from Global Fishing Watch. We also see how GeoParquet can be used with this data to improve the efficiency of data retrieval and enable large-scale geospatial visualization using GeoArrow and the Lonboard Python visualization library.

Read "Exploring Global Fishing Watch Public Data With SedonaDB & GeoParquet"

Apache Sedona 1.5.1 Release

Apache Sedona 1.5.1 Release Notes

The most recent release of Apache Sedona introduces some exciting new updates including support for Spark 3.5, 20+ new raster functions, 7 new vector functions, support for running Sedona in Snowflake with SedonaSnow, updates to Sedona’s GeoParquet reader and writer, and more! The updated raster functions include RS_ZonalStats for computing zonal statistics, RS_Tile and RS_TileExplode to enable tiling large rasters, and updates to RS_MapAlgebra to enable user defined raster functions that can work across multiple rasters. Updated vector functions include ST_IsValidReason which exposes the reason geometries might not be valid, and ST_LineLocatePoint which can be useful for map matching and snapping data to road networks.

Read More About Apache Sedona 1.5.1 In The Release Notes

Hands-On With Havasu and GeoParquet

GeoParquet and Iceberg Havasu

Each month you can find a new livestream tutorial on the Wherobots YouTube channel. January’s livestream was all about working with GeoParquet and Havasu tables in SedonaDB. We dig in to understanding some of the optimizations built into the Apache Parquet format to learn how Parquet delivers efficient data storage and data retrieval before exploring the GeoParquet specification for storing geospatial data in Parquet. We cover loading, analyzing, and creating GeoParquet files using SedonaDB with a focus on comparing performance of various GeoParquet partitioning strategies. Finally, we see how the Havasu extension to the Apache Iceberg table format enables working with both vector and raster geospatial data backed by GeoParquet but with the familiar developer experience of SQL tables.

Watch The Recording: Hands-On With Havasu And GeoParquet

Upcoming Events

  • Apache Sedona Community Office Hour (Online Zoom Call – February 13, 2024) – Join the Apache Sedona community for updates on the state of Apache Sedona, presentation and demo of recent features, and provide your input into the roadmap, future plans, and contribution opportunities.
  • Raster Data Analysis With Spatial SQL & SedonaDB (Online Livestream – February 29th, 2024) – This month our livestream is focused on raster data analysis. We’ll see how to load raster data in SedonaDB and perform raster operations like map algebra and zonal statistics using Spatial SQL. Be sure to subscribe to the Wherobots YouTube channel to keep up to date with more Wherobots livestreams and videos!

Want to receive this monthly update in your inbox? Sign up for the This Month In Wherobots Newsletter: