Introducing GeoStats for WherobotsAI and Apache Sedona

Introducing GeoStats for WherobotsAI and Apache Sedona

We are excited to introduce GeoStats, a machine learning (ML) and statistical toolbox for WherobotsAI and Apache Sedona users. With GeoStats, you can easily identify critical patterns in geospatial datasets such as hotspots and anomalies, and quickly get critical insights from large scale data. While these algorithms are supported in other packages, we’ve optimized each algorithm to be highly performant for small to planetary scale geospatial workloads. That means, you can get results from these algorithms significantly faster, at a lower cost, and do it all more productively, through a unified development experience purpose-built for geospatial data science and ETL.

The Wherobots toolbox supports DBSCAN, Local Outlier Factor (LOF), and Getis-Ord (Gi*) algorithms. Apache Sedona users can utilize DBSCAN starting with Apache Sedona version 1.7.0, and like all other features of Apache Sedona, its fully compatible with Wherobots.

Use Cases for GeoStats

DBSCAN

DBSCAN is the most popular algorithm we see in geospatial use cases. It identifies clusters, areas of your data that are closely packed together, and outliers, areas of your data that are set apart.

Typical use cases for DBSCAN are found in:

  • Retail: Decision makers use DBSCAN with location data to understand areas of high and low pedestrian activity to decide where to setup retailing establishments.
  • City Planning: City planners use DBSCAN with GPS data to optimize transit support by identifying high usage routes, areas in need of additional transit options, or areas that have too much support.
  • Air Traffic Control: Traffic controllers use DBSCAN to identify areas with increasing weather activity to optimize flight routing.
  • Risk computation: Insurers and others can use DBSCAN to make policy decisions and calculate risk where risk is correlated to the proximity of two or more features of interest.
Local Outlier Factor (LOF)

LOF is an anomaly detection algorithm that identifies outliers present in a dataset.

Typical use cases for LOF include:

  • Data analysis and cleansing: Data teams can use LOF to identify and remove anomalies within a dataset, like removing erroneous GPS data points from a trace dataset
Getis-Ord (Gi*)

Getis-Ord is also a popular algorithm for identifying local hot and cold spots.

Typical use cases for Gi* include:

  • Public health: Officials can use disease data with Gi* to identify areas of abnormal disease outbreak
  • Telecommunications: Network administrators can use Gi* to identify areas of high demand and optimize network deployment
  • Insurance: Insurers can identify areas prone to specific claims to better manage risk

Traditional challenges with using these algorithms on geospatial data

Before GeoStats, teams leveraging any of the algorithms in the toolbox in a data analysis or ML pipeline would:

  1. Struggle to get performance or scale from the underlying solutions that also don’t perform well when joining geospatial data.
  2. Determine how to host and scale open source versions of popular ML and statistical algorithms, like PostGIS or scikit-learn DBSCAN, PySal Gi*, or scikit-learn LOF, to work for geospatial data types and geospatial data formats.
  3. Replicate this overhead each time they want to deploy a new algorithm for geospatial data.

Benefits of WherobotsAI GeoStats

With GeoStats in WherobotsAI, you can now:

  1. Easily run native algorithms on a cloud-based engine, optimized for producing spatial data products and insights at scale.
  2. Use these algorithms without the operational overhead associated with setup and maintenance.
  3. Leverage optimized, hosted algorithms within a single platform to easily experiment and get critical insights faster.

We’ll walk through a brief overview of each algorithm, how to use them, and show how they perform at various scales.

Diving Deeper into the GeoStats Toolbox

DBSCAN Overview

DBSCAN is a density-based clustering algorithm. Given a set of points in some space, it groups points with many nearby neighbors and marks as outlier points that lie alone in low-density regions.

How to use DBSCAN in Wherobots

The following examples assume you have already setup an organization and have an active runtime and notebook, with a dataframe of interest to run the algorithms on.

WherobotsAI GeoStats DBSCAN Python API Overview
For a full walk through see the Python API reference: dbscan(...).

  • Supported Geometries: points, linestrings, polygons
  • Hyperparameters: max distance to neighbors (epsilon), min neighbors (min_points)
  • Output: dataframe with cluster id

DBSCAN Walk Through

  1. Choose your dataset and create a Sedona DataFrame.
dataset=sedona.createDataFrame(X).select(ST_MakePoint("_1", "_2").alias("geometry"))
  1. Choose values for your hyperparameters, max distance to neighbors (epsilon) and minimum neighbors (min_points). These values will determine how DBSCAN identifies clusters.
epsilon=0.3
min_points=10
  1. Run DBSCAN on your DataFrame with your chosen hyperparameter values.
clusters_df = dbscan(df, epsilon=0.3, min_points=10, include_outliers=True)
  1. Analyze the results. For each datapoint, DBSCAN returns the cluster it’s associated with or if it’s an outlier.
+--------------------+------+-------+
|            geometry|isCore|cluster|
+--------------------+------+-------+
|POINT (1.22185277...| false|      1|
|POINT (0.77885034...| false|      1|
|POINT (-2.2744742...| false|      2|
+--------------------+------+-------+

only showing top 3 rows

There’s a complete example of how to use DBSCAN in the Wherobots user documentation.

DBSCAN Performance Overview

To show DBSCAN performance in Wherobots, we created a European sample of the Overture buildings dataset, and ran DBSCAN to identify clusters of buildings near each other, starting from the geographic center of Europe and worked outwards. For each subsampled dataset, we run DBSCAN with an epsilon of 0.005 degrees (i.e. ~30 feet) and min_points value of 4 on a Large runtime in Wherobots Cloud. As seen below, DBSCAN effectively processes an increasing number of records, with 100M records taking 1.6 hrs to process.

Local Outlier Factor (LOF)

LOF is an anomaly detection algorithms that identifies outliers present in a dataset. It does this by measuring how close a given data point is to a set of k-nearest neighbors (with k being a user chosen hyperparameter) in comparison to how close its nearest neighbors are to their nearest neighbors. LOF provides a score that represents the degree to which a record is an inlier or outlier.

How to use LOF in Wherobots

For the full example, please see this docs page.

WherobotsAI GeoStats LOF Python API Overview
For a full walk through see the Python API reference: local_outlier_factor(...).

  • Supported Geometries: points, linestrings, polygons
  • Hyperparameters: number of nearest neighbors to use
  • Output: score representing degree of inlier or outlier

LOF Walk Through

  1. Choose your dataset and create a Sedona DataFrame.
df = sedona.createDataFrame(X).select(ST_MakePoint(f.col("_1"), f.col("_2")).alias("geometry"))
  1. Choose your k value for how many nearest neighbors you want to use to measure density near a given datapoint.
k=20
  1. Run LOF on your DataFrame with your chosen k value.
outliers_df = local_outlier_factor(df, k=20)
  1. Analyze your results. LOF returns a score for each datapoint representing the degree of inlier or outlier.
+--------------------+------------------+
|            geometry|               lof|
+--------------------+------------------+
|POINT (-1.9169927...|0.9991534865548664|
|POINT (-1.7562422...|1.1370318880088373|
|POINT (-2.0107478...|1.1533763384772193|
+--------------------+------------------+
only showing top 3 rows

There’s a complete example of how to use LOF in the Wherobots user documentation.

LOF Performance Overview

We followed the same procedure with DBSCAN but ran LOF to identify clusters of buildings near each other. With each set of buildings we ran LOF with a k=20 on a large Wherobots Cloud runtime. As seen below, GeoStats LOF scales effectively with growing data size with 100M records taking 10 mins to process.

Getis-Ord (Gi*) Overview

Getis-Ord is an algorithm for identifying statistically significant local hot and cold spots.

How to use GeoStats Gi*

WherobotsAI GeoStats Gi* Python API Overview
For the full example, please see this docs g_local(...).

  • Supported Geometries: points, linestrings, polygons
  • Hyperparameters: star, neighbor weighting
  • Output: Set of statistics that indicate the degree of local hot or cold spot for a given record

Gi* Walk Through

  1. Choose your dataset and create a Sedona Dataframe.
places_df = (
    sedona.table("wherobots_open_data.overture_2024_07_22.places_place")
        .select(f.col("geometry"), f.col("categories"))
        .withColumn("h3Cell", ST_H3CellIDs(f.col("geometry"), h3_zoom_level, False)[0])
)
  1. Choose how you’d like to weight datapoints (ex: datapoints in a specific geographic area need to be weighted higher or any datapoint close to a given datapoint need to be weighted higher) and star (boolean to indicate if a record is a neighbor of itself).

star = True
neighbor_search_radius_degrees = 1.0
variable_column = "myNumericColumnName"

weighted_dataframe = add_binary_distance_band_column(
        df,
        neighbor_search_radius_degrees,
        include_self=star
)
  1. Run Gi* on your DataFrame with your chosen hyperparameters.
gi_df = g_local(
        weighted_dataframe,
    variable_column,
    star=star
)
  1. Analyze your results. For each datapoint, Gi* returns a set of statistics that indicate the degree of local hot or cold spot.
+----------+-------------------+--------------------+--------------------+------------------+--------------------+
|num_places|                  G|                  EG|                  VG|                 Z|                   P|
+----------+-------------------+--------------------+--------------------+------------------+--------------------+
|       871| 0.1397485091609774|0.013219284603421462|5.542296862370928E-5|16.995969941572465|                 0.0|
|       908|0.16097739240211956|0.013219284603421462|5.542296862370928E-5|19.847528249317246|                 0.0|
|       218|0.11812096144582315|0.013219284603421462|5.542296862370928E-5|14.090861243071908|                 0.0|
+----------+-------------------+--------------------+--------------------+------------------+--------------------+
only showing top 3 rows

There’s a complete example of how to use Gi* in the Wherobots user documentation.

Getis-Ord Performance Overview

To showcase how Gi performs in Wherobots, again we used the same example as DBSCAN, but ran Gi on the area of the buildings. With each set of buildings we ran Gi* with a binary neighbor weight and a neighborhood radius of .007 degrees (~0.5 miles) on a Large runtime in Wherobots Cloud. As seen below, the algorithm scales mostly linearly with the number of records, with 100M records taking 1.6 hours to process.

Get started with WherobotsAI GeoStats

The way we implemented these algorithms for large scale geospatial workloads, will help you make sense of your geospatial data faster. You can get started for free today.

  • If you haven’t already, create a free Wherobots Organization subscribed to the Community Edition of Wherobots.
  • Start a Wherobots Notebook
  • In the Notebook environment, explore the notebook_example/python/wherobots-ai/ folder for examples that you can use to get started.
  • Need additional help? Check out our user documentation, and send us a note if needed at support@wherobots.com.

Apache Sedona Users

Apache Sedona users will have access to GeoStats DBSCAN with the Apache Sedona 1.7.0 release. Subscribe to the Sedona newsletter and join the Sedona community to get notified of the release and get started!

What’s next

We’re excited to hear what ML and statistical algorithms you’d like us to support. We can’t wait for your feedback and to see what you’ll create!

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

The WherobotsAI Team

What is Apache Sedona?

Apache Sedona Overview

Apache Sedona is a cluster computing system for processing large-scale spatial data. It treats spatial data as a first class citizen by extending the functionality of distributed compute frameworks like Apache Spark, Apache Flink, and Snowflake. Apache Sedona was created at Arizona State University under the name Geospark, in the paper “Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond.”

Apache Sedona introduces data types, operations, and indexing techniques optimized for spatial workloads on top of Apache Spark and other distributed compute frameworks. Let’s take a look at the workflow for analyzing spatial data with Apache Sedona.

Apache Sedona Architecture

Spatial Query Processing

The first step in spatial query processing is to ingest geospatial data into Apache Sedona. Data can be loaded from various sources such as files (Shapefiles, GeoJSON, Parquet, GeoTiff, CSV, etc) or databases intro Apache Sedona’s in-memory distributed spatial data structures (typically the Spatial DataFrame).

Next, Sedona makes use of spatial indexing techniques to accelerate query processing, such as R-trees or Quad trees. The spatial index is used to partition the data into smaller, manageable units, enabling efficient data retrieval during query processing.

Once the data is loaded and indexed spatial queries can be executed using Sedona’s query execution engine. Sedona supports a wide range of spatial operations, such as spatial joins, distance calculations, and spatial aggregations.

Sedona optimizes spatial queries to improve performance. The query optimizer determines an efficient query plan by considering the spatial predicates, available indexes, and the distribution of data across the cluster.

Spatial queries are executed in a distributed manner using Sedona’s computational capabilities. The query execution engine distributes the query workload across the cluster, with each node processing a portion of the data. Intermediate results are combined to produce the final result set. Since spatial objects can be very complex with many coordinates and topology, Sedona implements a custom serializer for efficiently moving spatial data throughout the cluster.

Common Apache Sedona Use Cases

So what exactly are users doing with Apache Sedona? Here are some common examples of what users are doing with Apache Sedona:

  • Creating custom weather, climate, and environmental quality assessment reports at national scale by combining vector parcel data with environmental raster data products.
  • Generating planetary scale GeoParquet files for public dissemination via cloud storage by combining, cleaning, and indexing multiple datasets.
  • Converting billions of daily point telemetry observations into routes traveled by vehicles.
  • Enriching parcel level data with demographic and environmental data at the national level to feed into a real estate investment suitability analysis.

Many of these use case can be described as geospatial ETL operations. ETL (extract, transform, load) is a data integration process that involves retrieving data from various sources, transforming and combining these datasets, then loading the transformed data into a target system or format for reporting or further analysis. Geospatial ETL shares many of the same challenges and requirements of traditional ETL processes with the additional complexities of managing the geospatial component of the data, working with geospatial data sources and formats, spatial data types and transformations, as well as the scalability and performance considerations required for spatial operations such as joins based on spatial relationships. For a more complete overview of use cases with Apache Sedona, you can read our ebook on it here.

Community Adoption

Apache Sedona has gained significant community adoption and has become a popular geospatial analytics library within the distributed computing and big data ecosystem. As an Apache Software Foundation (ASF) incubator project, Sedona’s governance, licensing, and community participation align with ASF principles.

Sedona has an active and growing developer community, with contributors from a number of different types of organizations and over 100 individuals interested in advancing the state of geospatial analytics and distributed computing. Sedona has reach over 38 million downloads with a rate of 2 million downloads per month with usage growing at a rate of 200% per year (as of the date this was published).

Organizations in industries including transportation, urban planning, environmental monitoring, logistics, insurance and risk analysis and more have adopted Apache Sedona. These organizations leverage Sedona’s capabilities to perform large-scale geospatial analysis, extract insights from geospatial data and build geospatial analytical applications at scale. The industry adoption of Apache Sedona showcases its practical relevance and real-world utility.

Apache Sedona has been featured in conferences, workshops, and research publications related to geospatial analytics, distributed computing, and big data processing. These presentations and publications contribute to the awareness, visibility, and adoption both within the enterprise and within the research and academic communities.

Resources

As you get started with Apache Sedona the following resources will be useful throughout your journey in the world of large-scale geospatial data analytics.

The best place to start learning about Apache Sedona is the authoritative book on the topic, which was recently published in early release format “Cloud Native Geospatial Analytics with Apache Sedona”. The team behind the project will continue to release chapters until the book is complete over the coming months. You can get the latest version of the book here.

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

Wherobots and PostgreSQL + PostGIS: A Synergy in Spatial Analysis

Introduction

Spatial data has never been more abundant and valued in decision making (everything happens somewhere!). The tools we use to process location based data can significantly impact the outcomes of our projects. In the geospatial ecosystem, Apache Sedona-powered Wherobots and PostgreSQL with the PostGIS extension each offer robust capabilities. They share some functionalities, but they are more powerful when used together, rather than in isolation. This post explores how combining Wherobots and PostgreSQL + PostGIS can enhance spatial data processing, offering a synergy that leverages the unique strengths of each tool to enhance data analysis and decision-making processes.

Exploring PostgreSQL + PostGIS

PostgreSQL is a powerful, open-source object-relational database system known for its robustness and reliability. When combined with the PostGIS extension, PostgreSQL transforms into a spatial database capable of executing location-based queries and spatial functions. This combination is ideally suited for projects where data integrity and detailed transaction management (ACID transactions) are crucial.

Key Features:

  • Data Integrity: Ensures high levels of data accuracy and consistency with ACID transaction compliance.
  • Complex Queries: Supports a wide array of SQL and spatial queries for detailed data management.
  • Integration: Seamlessly integrates with existing IT infrastructures and data systems.

Architecture:

PostgreSQL’s architecture, a traditional RDBMS, uses a process-based database server architecture. It is designed for detailed, transactional data management, where data integrity and complex query capabilities are paramount.

Characteristics:

  • Transactional Integrity: PostgreSQL excels in maintaining data consistency and integrity through ACID (Atomicity, Consistency, Isolation, Durability) compliance, making it ideal for applications requiring reliable data storage.
  • Structured Data Management: Data must be written into PostgreSQL tables, allowing for sophisticated querying and indexing capabilities provided by PostGIS. This ensures precise spatial data management.
  • Integration with Legacy Systems: PostgreSQL + PostGIS can easily integrate with existing IT infrastructure, providing a stable environment for long-term data storage and complex spatial queries.

Understanding Wherobots & Apache Sedona

Wherobots, founded by the original creators of Apache Sedona, represents a significant advancement in modern cloud native spatial data processing. Apache Sedona is an open-source cluster computing system designed for the processing of large-scale spatial data, and Wherobots (continuously optimizing Sedona for performance) employs this technology with serverless deployment principles to analyze data distributed across clusters. This approach allows Wherobots to perform complex spatial queries and analytics efficiently, handling massive datasets that are beyond the scope of traditional geospatial data processing tools.

Key Features:

  • Scalability: Easily scales horizontally to handle petabytes of data across through a cluster computing architecture.
  • Speed: Utilizes memory-centric architecture to speed up data processing tasks.
  • Complex Algorithms: Supports numerous spatial operations and algorithms essential for advanced geographic data analysis.
  • Spatial SQL API: Over 300 (and growing) spatial type SQL functions covering both raster and vector analysis.
  • Pythonic Dataframe API: Extends the Spark Dataframe API for spatial data processing.
  • Python DB API: Python module conforming to the Python DB API 2.0 specification with familiar access patterns.

Architecture:

The backbone of Wherobots, Apache Sedona, is designed with a modern cloud native and distributed computing framework in mind. Wherobots orchestrates and manages the cluster distributed compute nodes to enable large-scale spatial data processing by bringing compute power directly to the data. This architecture is purpose built for handling big data analytics and planetary-scale spatial data processing.

Characteristics:

  • Distributed Processing: Wherobots and Apache Sedona utilize a cluster of nodes, enabling parallel processing of massive datasets. This makes it ideal for tasks requiring high scalability and speed.
  • In-Memory Computation: By leveraging in-memory processing, Sedona significantly reduces the time needed for complex spatial operations, allowing for rapid data exploration and transformation.
  • Cloud Integration: Wherobots architecture is naturally aligned with cloud-based solutions, enabling seamless integration with cloud services like Amazon Web Services (AWS). This facilitates scalable and flexible spatial data analytics.

Complementarity in Action

Sedona in Wherobots and PostgreSQL + PostGIS can both perform spatial queries, but their real power lies in how they complement each other. For example:

  • Data Handling and Performance: Apache Sedona in Wherobots Cloud is second to none at handling and analyzing large volumes of data, making them ideal for initial data processing, exploration, transformation, and analysis. In contrast, PostgreSQL + PostGIS excels in managing more detailed transactions and operations requiring high precision and integrity.
    • Use Wherobots for: Scalable Data Operations, Wherobots efficiently scales processing across multiple nodes, ideal for rapid analysis of spatial data at local to global levels.
    • Use PostgreSQL & PostGIS for: PERSISTENCE, PostgreSQL + PostGIS provides the tools necessary for precise data management crucial for applications like urban planning and asset management.
  • Use Case Flexibility: Wherobots can quickly process vast quantities of spatial data, making them suitable for environments and applications that require immediate insights on large volumes of data. PostgreSQL + PostGIS, being more transaction oriented, is perfect for applications where long-term data storage and management are needed.
    • Use Wherobots for: Processing Speed, when your scenario requires immediate response.
    • Use PostgreSQL & PostGIS for: Data Management, in scenarios requiring meticulous data integrity, such as environmental monitoring over years or detailed urban planning projects.
  • Analytical Depth: When initial processing (extractions, transformations, enrichment, etc.) on large scale data is done with Wherobots, ETL run times can be greatly reduced. Once processed, the data can be permanently stored and hosted in PostgreSQL + PostGIS. This allows users to perform deep, granular analyses at scale and then persist and serve those insights.
    • Use Wherobots for: Transforming and Enriching Data: Extract, transform, enrich, refine, and analyze data in Wherobots Cloud at scale and with unparalleled speed.
    • Use PostgreSQL & PostGIS for: Robust Data Ecosystem. Combining these tools ensures a robust data and analysis and management ecosystem capable of supporting complex, multi-dimensional spatial queries.

Integration Illustration

Integration Illustration

Let’s illustrate how these two systems can be integrated in a complimentary manner.

We’ll assume the persona of a fictitious company named “Synergistic GeoFusion Technologies (SGT) holdings”.

SGT holdings handles a vast array of spatial data from diverse sources such as sensors, satellites, and user inputs. The volume and velocity of the data collection requires a sophisticated approach to maximize efficiency. Wherobots steps in as the initial processing powerhouse, applying its Apache Sedona-based cluster computing to perform heavy-duty ETL (Extract, Transform, Load) tasks. This process involves cleansing, integrating, and transforming the raw data into a more structured format. Wherobots’ capability to handle massive datasets efficiently complements PostGIS’s robust data storage and querying capabilities by preparing the data for detailed spatial analysis and storage.

Once Wherobots processes the data, it can be seamlessly transferred to PostGIS, which cab serve as the system of record. This PostgreSQL extension is well-suited for ongoing data management, thanks to its sophisticated spatial functions and ability to handle complex queries. PostGIS’s strong querying capabilities complement Wherobots’ data processing strength, providing a stable platform for further data manipulation and refinement.

PostGIS is not only a storage layer but also a platform for ongoing data edits and updates, ensuring data integrity and relevance. When complex, resource-intensive spatial analyses are required, Wherobots is re-engaged. This dual engagement allows SGT holdings to handle routine data management in PostGIS while delegating computationally demanding tasks back to Wherobots, thus utilizing each system’s strengths to full effect.

For visualization, SGT holdings again leverages Wherobots to generate vector tiles from the analyzed data. These tiles are crucial for dynamic, scalable visual representations in the company’s internal dashboards and tools. Wherobots’ ability to generate these tiles efficiently complements PostGIS’s role by providing a means to visualize the data stored and managed within PostGIS. This not only enhances user interactions with the data but also provides a seamless bridge from data analysis to actionable insights through visual exploration.

Using Wherobots and PostGIS in this complimentary manner, SGT has established a highly efficient workflow that leverages the distinct capabilities of each system. They now have the capability to ingest all their data, manage it effectively, and run their post hoc analysis tasks to server internal and external clients efficiently and in a cost effective manner.

Performance Benchmark: Wherobots vs. PostGIS

When comparing WherobotsDB to PostgreSQL with PostGIS for spatial queries, WherobotsDB’s performance benefits become evident as data size increases. Initially, PostGIS’ precomputed GIST indexes give it an edge on smaller datasets due to faster query execution times. However, as datasets grow larger, the dynamic, on-the-fly indexing of WherobotsDB surpasses PostGIS. The overhead associated with WherobotsDB’s distributed system is outweighed by its ability to efficiently handle complex, large-scale queries, ultimately making it significantly faster and more scalable in high-demand scenarios.

In essence, WherobotsDB may start off slower with smaller datasets, but its performance dramatically improves with larger datasets, far exceeding PostGIS’ capabilities. This makes WherobotsDB the preferred choice when working with extensive spatial data that demands efficient, scalable processing.

See the tables below for a detailed comparison of performance metrics.

Datasets Size Number of Rows     Types of Geometries
OSM Nodes 256GB 7,456,990,919 Point
Overture Buildings     103GB 706,967,095 Polygon/
Multipolygon
OSM Postal Codes 0.86GB 154,452 Polygon
WITH t AS 
( 
    SELECT 
        * 
    FROM 
        overture_buildings_test AS buildings 
    JOIN 
        osm_nodes_test AS nodes 
    ON ST_Intersects(
        nodes.geometry, 
        buildings.geometry) 
) 
SELECT 
    COUNT(*) 
FROM 
    t;
Dataset sizes PostGIS WherobotsDB (Cairo)     Relative Speed
1M Buildings x 1M Nodes 0.70s 2.24s 0.31 x
1M Buildings x 10M Nodes 32.30s 33.44s 0.97 x
10M Buildings x 1M Nodes 28.30s 21.72s 1.30 x
10M Buildings x 10M Nodes 49.30s 3m 36.29s 0.23 x
All Buildings x 1M Nodes 8m 44s 36.59s 14.32 x
All Buildings x 10M Nodes 38m 30s 49.77s 46.42 x
10K Buildings x All Nodes 2h 29m 19s 46.42s 193.00 x
100K Buildings x All Nodes 4h 22m 36s 51.75s 304.57 x
1M Buildings x All Nodes 3h 29m 56s 1m 19.63s 158.17 x
10M Buildings x All Nodes 8h 48m 13s 2m 13.65s 237.17 x
All Buildings x All Nodes +24h (Aborted) 4m 31.84s + 317.83 x

Conclusion: Better Together for Spatial Brilliance

Individually, Wherobots and PostgreSQL + PostGIS are powerful tools. But when combined, they unlock new possibilities for spatial analysis, offering a balanced approach to handling both large-scale data processing and detailed, precise database management. By understanding and implementing their complementary capabilities, organizations can achieve more refined insights and greater operational efficiency in their spatial data projects.

By utilizing both tools strategically, companies can not only enhance their data analysis capabilities but also ensure that they are prepared for a variety of spatial data challenges, now and in the future.

To learn more about Wherobots, reach out or give it a try with a trial account

Generating global PMTiles from Overture Maps in 26 minutes with WherobotsDB VTiles

We previously described the performance and scalability challenges of generating tiles and how they can be overcome with WherobotsDB VTiles. Today we will demonstrate how you can use VTiles to generate vector tiles for three planetary scale Overture Maps Foundation datasets in PMTiles format: places, buildings, and division areas.

Quick recap: What are Vector Tiles and why should you use PMTiles?

Vector tiles are small chunks of map data that allow for efficient and customizable map rendering at varying zoom levels. Vector tiles contain geometric and attribute data, for example roads and their names, that facilitate dynamic styling of map features on the fly, offering more flexibility and interactivity.

PMTiles is a cloud-native file format that is designed for holding an entire collection of tiles, in this case vector tiles. The PMTiles format enables individual tiles to be queried directly from cloud object storage like Amazon S3. By querying directly from cloud storage, you no longer need to set up and manage dedicated infrastructure, reducing your costs, effort, and time-to-tile-generation.

Tile Viewer

If you’re sharing, inspecting, or debugging tiles you’ll need to visualize them. To make these processes easier, Wherobots created a tile viewer site, available at tile-viewer.wherobots.com. This tool comes from the PMTiles github repository, and it has offers the following features:

  • Viewing tiles with a dark themed basemap
  • Inspecting individual tiles, selected from a list of all the tiles in the set
  • Inspecting the metadata and header information of the PMTiles file

This viewer takes a url for a tileset. If your tiles are stored in a private S3 bucket you will need to generate a signed URL. Wherobots Cloud has a function for converting your S3 URI to a signed url:

from wherobots.tools.utility.s3_utils import get_signed_url

get_signed_url(my_s3_path, expiration_in_seconds)

my_s3_path will be an s3 uri, like s3://myBucket/my/prefix/to/tiles.pmtiles and expiration_in_seconds will be an int representing the number of seconds the signed url will be valid for.

The tile viewer will be used to explore the tiles we generate in our examples.

Examples

The following examples show tile generation using VTiles for three Overture layers at a planetary scale. Because we are working with planetary scale datasets and want quick results, we will use the large runtimes available in the professional tier of Wherobots Cloud.

Tile generation time is provided in each example, and includes time to load the input data, transform it, generate tiles, and save the PMTiles file in an S3 bucket. It does not include the time to start the cluster.

To run the examples below, just make sure your sedona session is started:

from sedona.spark import SedonaContext

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
Places

We start by creating PMTiles for the places dataset. With VTiles, this is a straightforward case for several reasons:

  1. The dataset contains only points. A point feature rarely spans multiple tiles as it has no dimensions. The tile generation time is strongly influenced by the sum of the number of features multiplied by the number of tiles which that feature intersects.
  2. At 50 million records, this is a relatively small dataset compared to the buildings dataset at 2.3 billion features.
  3. We will do minimal customization. VTiles’ feature filters allow us to control which features go into which tiles based on the tile id (x, y, z) and the feature itself (area, length, and user-provided columns). We will go more in depth on feature filters in the division areas example.
import pyspark.sql.functions as f
import os

from wherobots.vtiles import GenerationConfig, generate_pmtiles

generate_pmtiles(
    sedona.table("wherobots_open_data.overture_2024_05_16.places_place").select(
        "geometry",
        f.col("names.primary").alias("name"),
        f.col("categories.main").alias("category"),
        f.lit('places').alias('layer'),
    ),
    os.getenv("USER_S3_PATH") + "tile_blog/places.pmtiles",
    GenerationConfig(6, 15)
)

This example generates a PMTiles file for zooms 6 through 15. Since the places dataset contains features that are not relevant at a global level, we selected a minimum zoom of 6, about the size of a large European country. The max zoom of 15 is selected because the precision provided should be sufficient and overzooming means that our places will still render at higher zooms. The OpenStreetMap wiki has a helpful page about how large a tile is at each zoom level. The name and category of each place will be included in the tiles.

Tile generation time was 2 minutes and 23 seconds on a Tokyo runtime. The resulting PMTiles archive is 28.1 GB.

Buildings

This example generates tiles for all buildings in the Overture building dataset. This is about 2.3 billion features. The roughly uniform size of the features and the relatively small size of buildings relative to the higher zoom tiles means that the number of (feature, tile) combinations is similar to |features| * |zooms|. Because of this homogeneity, we can expect a quick execution without the use of a feature filter. This example represents a typical use case where there is a very large number of features and where the extent of a tile at maximum zoom is larger than the size of a feature.

import pyspark.sql.functions as f
import os

from wherobots.vtiles import GenerationConfig, generate_pmtiles

generate_pmtiles(
    sedona.table("wherobots_open_data.overture_2024_05_16.buildings_building").select(
        "geometry",
        f.lit('buildings').alias('layer'),
    ),
    os.getenv("USER_S3_PATH") + "tile_blog/buildings.pmtiles",
    GenerationConfig(10, 15)
)

This example generates a PMTiles file for zooms 10 through 15. The minimum zoom of 10 was selected because buildings aren’t useful at lower zooms for most use cases. The max zoom of 15 was selected because the precision provided should be sufficient and overzooming means that our buildings will still render at higher zooms. The properties of a very large percentage of the Overture buildings are null so we haven’t included any here.

Tile generation time was 26 minutes on a Tokyo runtime. The resulting PMTiles archive is 438.4 GB.

Division Areas

The third example creates tiles for all polygons and multipolygons in the Overture division areas dataset. This dataset is just under one million records. Despite its small size, this dataset can be challenging to process. It contains polygons and multipolygons representing areas, from countries which are large and highly detailed, to small neighborhoods with minimal detail. The appropriate min/max zoom for countries and neighborhoods is very different.

Recall from the places example that the amount of work the system must do is strongly related to the number of (feature, tile) pairs. A country outline like Canada might cover an entire tile at zoom 5. It will be in roughly 2 * 4^(max_zoom - 5) tiles across all zooms; if max_zoom is 15, that’s over 2 million tiles. You can quickly wind up with an unexpectedly large execution time and tiles archive if you do not take this into account. Most use cases will benefit from setting different min and max zooms for different features, which you can do in VTiles via a feature filter.

Let’s first profile the base case with no feature filter.

import pyspark.sql.functions as f
import os

from wherobots.vtiles import GenerationConfig, generate_pmtiles

generate_pmtiles(
    sedona.table("wherobots_open_data.overture_2024_05_16.divisions_division_area").select(
        "geometry",
        f.col("names.primary").alias("name"),
        f.col("subtype").alias('layer'),
    ),
    os.getenv("USER_S3_PATH") + "tile_blog/division_area.pmtiles",
    GenerationConfig(3, 15)
)

This run took a bit over 3 hours on a Tokyo runtime. The resulting PMTiles archive is 158.0 GB. This small dataset takes more time than the buildings dataset that is more than 2300 times larger!

Feature Filters

We can significantly accelerate the execution time of this example using the VTiles feature filters. These feature filters are most commonly used to determine what features should be in a tile on the basis of a category and the zoom level. In this case we will only show countries at lower zooms and neighborhoods at the highest zoom levels. The visual impact of a feature that is much larger than the tile is minimal in typical use cases. The visual impact of a neighborhood is null when it’s smaller than the tile can resolve; it is literally invisible, or perhaps a single pixel. By excluding these features that add no visual information, we save processing time and storage costs, as well as increase the performance of serving the now-smaller tiles.

Here is an example of using feature filters to improve performance of this division area generation task:

import pyspark.sql.functions as f
import os

from wherobots.vtiles import GenerationConfig, generate_pmtiles

generate_pmtiles(
    sedona.table("wherobots_open_data.overture_2024_05_16.divisions_division_area").select(
        "geometry",
        f.col("names.primary").alias("name"),
        f.col("subtype").alias('layer'),
    ),
    os.getenv("USER_S3_PATH") + "tile_blog/division_area_filtered.pmtiles",
    GenerationConfig(
        min_zoom=2, 
        max_zoom=15,
        feature_filter = (
            ((f.col("subType") == f.lit("country")) & (f.col("tile.z") < f.lit(7))) |
            ((f.col("subType") == f.lit("region")) & (f.lit(3) < f.col("tile.z")) & (f.col("tile.z") < f.lit(10))) |
            ((f.col("subType") == f.lit("county")) & (f.lit(9) < f.col("tile.z")) & (f.col("tile.z")  < f.lit(12))) |
            ((f.col("subType") == f.lit("locality")) & (f.lit(10) < f.col("tile.z")) & (f.col("tile.z")  < f.lit(14))) |
            ((f.col("subType") == f.lit("localadmin")) & (f.lit(13) < f.col("tile.z"))) |
            ((f.col("subType") == f.lit("neighborhood")) & (f.lit(13) < f.col("tile.z")))
        )
    )
)

This run took less than 10 minutes on a Tokyo runtime. The resulting PMTiles archive is 8.9 GB.

Feature filters reduced tile generation time by more than 90%, reduced the dataset size, and lowered the cost compared to the original example. Tiles will also appear less cluttered to the user without having to get one’s hands dirty playing with style sheets.

A Note on Working without Feature Filters

We know that there are use cases with large geometries where it might be difficult to write an effective feature filter or it may be undesirable to filter. For those use cases we have launched a feature in Wherobots 1.3.1 to improve tile generation performance. This will be an option on the GenerationConfig called repartition_frequency. When features are repeatedly split as the algorithm zooms in, those child features wind up in the same partition. This can cause well partitioned input datasets to become skewed by even just a single large record. Setting a repartition frequency to 2 or 4 can help to keep utilization of the cluster high by keeping partitions of roughly uniform size.

Conclusion

The VTiles tile generation functionality is a fast and cost effective way to generate tiles for global data. The Apache Spark-based runtime powered by Apache Sedona and Wherobots Cloud makes loading and transforming data for input into the system straightforward and performant even on large datasets. You can leverage feature filters to curate the contents of your tiles to your use cases and performance goals. We encourage you to try out VTiles with your own data on Wherobots Cloud.

Easily create trip insights at scale by snapping millions of GPS points to road segments using WherobotsAI Map Matching

What is Map Matching?

GPS data is inherently noisy and often lacks precision, which can make it challenging to extract accurate insights. This imprecision means that the GPS points logged may not accurately represent the actual locations where a device was. For example, GPS data from a drive around a lake may incorrectly include points that are over the water!

To address these inaccuracies, teams commonly use two approaches:

  1. Identifying and Dropping Erroneous Points: This method involves manually or algorithmically filtering out GPS points that are clearly incorrect. However, this approach can reduce analytical accuracy, be costly, and is time-intensive.
  2. Map Matching Techniques: A smarter and more effective approach involves using map matching techniques. These techniques take the noisy GPS data points and compute the most likely path taken based on known transportation segments such as roadways or trails.

WherobotsAI Map Matching offers an advanced solution for this problem. It performs map matching with high scale on millions or even billions of trips with ease and performance, ensuring that the GPS data aligns accurately with the actual paths most likely taken.

map matching telematics

An illustration of map matching. Blue dots: GPS samples, Green line: matched trajectory.

Map matching is a common solution for preparing GPS data for use in a wide range of applications including:

  • Sattelite & GPS based navigation
  • GPS tracking of freight
  • Assessing risk of driving behavior for improved insurance pricing
  • Post hoc analysis of self driving car trips for telematics teams
  • Transportation engineering and urban planning

The objective of map matching is to accurately determine which road or path in the digital map corresponds to the observed geographic coordinates, considering factors such as the accuracy of the location data, the density and layout of the road network, and the speed and direction of travel.

Existing Solutions for Map Matching

Most map matching implementations are variants of the Hidden Markov Model (HMM)-based algorithm described by Newson and Krumm in their seminal paper, "Hidden Markov Map Matching through Noise and Sparseness." This foundational research has influenced a variety of map matching solutions available today.

However, traditional HMM-based approaches have notable downsides when working with large-scale GPS datasets:

  1. Significant Costs: Many commercially available map matching APIs charge substantial fees for large-scale usage.
  2. Performance Issues: Traditional map matching algorithms, while accurate, are often not optimized for large-scale computation. They can be prohibitively slow, especially when dealing with extensive GPS data, as the underlying computation struggles to handle the data scale efficiently.

These challenges highlight the need for more efficient and cost-effective solutions capable of handling large-scale GPS datasets without compromising on performance.

RESTful API Map Matching Options

The Mapbox Map Matching API, HERE Maps Route Matching API, and Google Roads API are powerful RESTful APIs for performing map matching. These solutions are particularly effective for small-scale applications. However, for large-scale applications, such as population-level analysis involving millions of trajectories, the costs can become prohibitively high.

For example, as of July 2024, the approximate costs for matching 1 million trips are:

  • Mapbox: $1,600
  • HERE Maps: $4,400
  • Google Maps Platform: $8,000

These prices are based on public pricing pages and do not consider any potential volume-based discounts that may be available.

While these APIs provide robust and accurate map matching capabilities, organizations seeking to perform extensive analyses often must explore more cost-effective alternatives.

Open-Source Map Matching Solutions

Open-source software such as such as Valhalla or GraphHopper can also be used for map matching. However, these solutions are designed for use on a single-machine. If your map matching workload exceeds the capacity that machine, your workload will suffer from extended processing times. Furthermore, you will end up running out of headroom if you are vertically scaling up the ladder of VM sizes.

Meet WherobotsAI Map Matching

WherobotsAI Map Matching is a high performance, low cost, and planetary scale map matching capability for your telematics pipelines.

WherobotsAI provides a scalable map matching feature designed for small to very large scale trajectory datasets. It works seamlessly with other Wherobots capabilities, which means you can implement data cleaning, data transformations, and map matching in one single (serverless) data processing pipeline. We’ll see how it works in the following sections.

How it works

WherobotsAI Map Matching takes a DataFrame containing trajectories and another DataFrame containing road segments, and produces a DataFrame containing map matched results. Here is a walk-through of using WherobotsAI Map Matching to match trajectories in the VED dataset to the OpenStreetMap (OSM) road network.

1. Preparing the Trajectory Data

First, we load the trajectory data. We’ll use the preprocessed VED dataset stored as GeoParquet files for demonstration.

dfPath = sedona.read.format("geoparquet").load("s3://wherobots-benchmark-prod/data/mm/ved/VED_traj/")

The trajectory dataset should contain the following attributes:

  • A unique ID for trips. In this example the ids attribute is the unique ID of each trip.
  • A geometry attribute containing LineStrings, in this case the geometry attribute is for trip data.

The rows in the trajectory DataFrame look like this:

+---+-----+----+--------------------+--------------------+
|ids|VehId|Trip|              coords|            geometry|
+---+-----+----+--------------------+--------------------+
|  0|    8| 706|[{0, 42.277558333...|LINESTRING (-83.6...|
|  1|    8| 707|[{0, 42.277681388...|LINESTRING (-83.6...|
|  2|    8| 708|[{0, 42.261997222...|LINESTRING (-83.7...|
|  3|   10|1558|[{0, 42.277065833...|LINESTRING (-83.7...|
|  4|   10|1561|[{0, 42.286599722...|LINESTRING (-83.7...|
+---+-----+----+--------------------+--------------------+
only showing top 5 rows
2. Preparing the Road Network Data

We’ll use the OpenStreetMap (OSM) data specific to the Ann Arbor, Michigan region to map match our trip data with. Wherobots provides an API for loading road network data from OSM XML files.

from wherobots import matcher
dfEdge = matcher.load_osm("s3://wherobots-examples/data/osm_AnnArbor_large.xml", "[car]")
dfEdge.show(5)

The loaded road network DataFrame looks like this:

+--------------------+----------+--------+----------+-----------+----------+-----------+
|            geometry|       src|     dst|   src_lat|    src_lon|   dst_lat|    dst_lon|
+--------------------+----------+--------+----------+-----------+----------+-----------+
|LINESTRING (-83.7...|  68133325|27254523| 42.238819|-83.7390142|42.2386159|-83.7390153|
|LINESTRING (-83.7...|9405840276|27254523|42.2386058|-83.7388915|42.2386159|-83.7390153|
|LINESTRING (-83.7...|  68133353|27254523|42.2385675|-83.7390856|42.2386159|-83.7390153|
|LINESTRING (-83.7...|2262917109|27254523|42.2384552|-83.7390313|42.2386159|-83.7390153|
|LINESTRING (-83.7...|9979197063|27489080|42.3200426|-83.7272283|42.3200887|-83.7273003|
+--------------------+----------+--------+----------+-----------+----------+-----------+
only showing top 5 rows

Users can also prepare the road network data from any data source using any data processing procedures, as long as the schema of the road network DataFrame conforms to the requirement of the Map Matching API.

3. Run Map Matching

Once the trajectories and road network data is ready, we can run matcher.match to match trajectories to the road network.

dfMmResult = matcher.match(dfEdge, dfPath, "geometry", "geometry")

The dfMmResult contains the trajectories snapped to the roads in matched_points attribute:

+---+--------------------+--------------------+--------------------+
|ids|     observed_points|      matched_points|       matched_nodes|
+---+--------------------+--------------------+--------------------+
|275|LINESTRING (-83.6...|LINESTRING (-83.6...|[62574078, 773611...|
|253|LINESTRING (-83.6...|LINESTRING (-83.6...|[5930199197, 6252...|
| 88|LINESTRING (-83.7...|LINESTRING (-83.7...|[4931645364, 6249...|
|561|LINESTRING (-83.6...|LINESTRING (-83.6...|[29314519, 773612...|
|154|LINESTRING (-83.7...|LINESTRING (-83.7...|[5284529433, 6252...|
+---+--------------------+--------------------+--------------------+
only showing top 5 rows

We can visualize the map matching result using SedonaKepler to see what the matched trajectories look like:

mapAll = SedonaKepler.create_map()
SedonaKepler.add_df(mapAll, dfEdge, name="Road Network")
SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("observed_points AS geometry"), name="Observed Points")
SedonaKepler.add_df(mapAll, dfMmResult.selectExpr("matched_points AS geometry"), name="Matched Points")
mapAll

The following figure shows the map matching results. The red lines are original trajectories, and the green lines are matched trajectories. We can see that the noisy original trajectories are all snapped to the road network.

map matching results example 2

Performance

We used WherobotsAI Map Matching to match 90 million trips across the entire US in just 1.5 hours on the Wherobots Tokyo runtime, which equates to approximately 1 million trips per minute. The average cost of matching 1 million trips is an order of magnitude less costly and more efficient than the options outlined above.

The “optimization magic” behind WherobotsAI Map Matching lies in how Wherobots intelligently and automatically co-partitions trajectory and road network datasets based on the spatial proximity of their elements, ensuring a balanced distribution of work. As a result, the computational load is balanced evenly through this partitioning strategy, and makes map matching with Wherobots highly efficient, scalable, and affordable compared to alternatives.

Try It Out!

You can try out WherobotsAI Map Matching by starting a notebook environment in Wherobots Cloud and running our example notebook within Wherobots Cloud.

notebook_example/python/wherobots-ai/mapmatching_example.ipynb

You can also check out the WherobotsAI Map Matching tutorial and reference documentation for more information!

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

Unlock Satellite Imagery Insights with WherobotsAI Raster Inference

Recently we introduced WherobotsAI Raster Inference to unlock analytics on satellite and aerial imagery using SQL or Python. Raster Inference simplifies extracting insights from satellite and aerial imagery using SQL or Python, and is powered by open-source machine learning models. This feature is currently in preview, and we are expanding it’s capabilities to support more models. Below we’ll dig into the popular computer vision tasks that Raster Inference supports, describe how it works, and how you can use it to run batch inference to find and map electricity infrastructure.

Watch the live demo of these capabilities here.

The Power of Machine Learning with Satellite Imagery

Petabytes of satellite imagery are generated each day all over the world in a dizzying number of sensor types and image resolutions. The applications for satellite imagery and other remote sensing data sources are broad and diverse. For example, satellites with consistent, continuous orbits are ideal for monitoring forest carbon stocks to validate carbon credits or estimating agricultural yields.

However, this data has been inaccessible for most analysts and even seasoned ML practitioners because insight extraction required specialized skills. We’ve done the work to make insight extraction simple and accessible to more people. Raster Inference abstracts the complexity and scales to support planetary-scale imagery datasets, so you don’t need ML expertise to derive insights. In this blog, we explore the key features that make Raster Inference effective for land cover classification, solar farm mapping, and marine infrastructure detection. And, in the near future, you will be able to use Raster Inference with your own models!

Introduction to Popular and Supported Machine Learning Tasks

Raster Inference supports the three most common kinds of computer vision models that are applied to imagery: classification, object detection, and semantic segmentation. Instance segmentation (combines object localization and semantic segmentation) is another common type of model which is not currently supported, but let us know if you need by contacting us and we can add it to the roadmap.

Computer Vision Detection Types
Computer Vision Detection Categories from Lin et al. Microsoft COCO: Common Objects in Context

The figure above illustrates these tasks. Image classification is when an image is assigned one or more text labels. In image (a), the scene is assigned the labels “person”, “sheep”, and “dog”. Image (b) is an example of object localization (or object detection). Object localization creates bounding boxes around objects of interest and assigns labels. In this image, five sheep are localized separately along with one human and one dog. Finally, semantic segmentation is when each pixel is given a category label, as shown in image (c). Here we can see all the pixels belonging to sheep are labeled blue, the dog is labeled red, and the person is labeled teal.

While these examples highlight detection tasks on regular imagery, these computer vision models can be applied to raster formatted imagery. Raster data formats are the most common data formats for satellite and aerial imagery. When objects of interest in raster imagery are localized, their bounding boxes can be georeferenced, which means that each pixel is localized to spatial coordinates, such as latitude and longitude. Therefore, georeferencing is object localization suited for spatial analytics.

https://wherobots.com/wp-content/uploads/2024/06/remotesensing-11-00339-g005.png

The example above shows various applications of object detection for localizing and classifying features in high resolution satellite and aerial imagery. This example comes from DOTA, a 15-class dataset of different objects in RGB and grayscale satellite imagery. Public datasets like DOTA are used to develop and benchmark machine learning models.

Not only are there many publicly available object detection models, but also there are many semantic segmentation models.

Semantic Segmentation
Sourced from “A Scale-Aware Masked Autoencoder for Multi-scale Geospatial Representation Learning”.

Not every machine learning model should be treated equally, and each will have their own tradeoffs. You can see the difference between the ground truth image (human annotated buildings representing the real world) and segmentation results across two models (Scale-MAE and Vanilla MAE). These results are derived from the same image at two different resolutions (referred to as GSD, or Ground Sampling Distance).

  • Scale-MAE is a model developed to handle detection tasks at various resolutions with different sensor inputs. It uses a similar MAE model architecture as the Vanilla MAE, but is trained specifically for detection tasks on overhead imagery that span different resolutions.
  • The Vanilla MAE is not trained to handle varying resolutions in overhead imagery. It’s performance suffers in the top row and especially the bottom row, where resolution is coarser, as seen by the mismatch between Vanilla MAE and the ground truth image where many pixels are incorrectly classified.

Satellite Analytics Before Raster Inference

Without Raster Inference, typically a team who is looking to extract insights from overhead imagery using ML would need to:

  1. Deploy a distributed runtime to scale out workloads such as data loading, preprocessing, and inference.
  2. Develop functionality to operate on raster metadata to easily filter it by location to run inference workloads on specific areas of interest.
  3. Optimize models to run performantly on GPUs, which can involve complex rewrites of the underlying model prediction logic.
  4. Create and manage data preprocessing pipelines to normalize, resize, and collate raster imagery into the correct data type and size required by the model.
  5. Develop the logic to run data loading, preprocessing, and model inference efficiently at scale.

Raster Inference and its SQL and Python APIs abstract this complexity so you and your team can easily perform inference on massive raster datasets.

Raster Inference APIs for SQL and Python

Raster Inference offers APIs in both SQL and Python to run inference tasks. These APIs are designed to be easy to use, even if you’re not a machine learning expert. RS_CLASSIFY can be used for scene classification, RS_BBOXES_DETECT for object detection, and RS_SEGMENT for semantic segmentation. Each function produces tabular results which can be georeferenced either for the scene, object, or segmentation depending on the function. The records can be joined or visualized with other data (geospatial or traditional) to curate enriched datasets and insights. Here are SQL and Python examples for RS_Segment.

RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
df = df_raster_input.withColumn("segment_result", rs_segment(model_id, col("outdb_raster")))

Example: Mapping Electricity Infrastructure

Imagine you want to optimize the location of new EV charging stations, but you want to target locations based on the availability of green energy sources, such as local solar farms. You can use Raster Inference to detect and locate solar farms and cross-reference these locations with internal data or other vector geometries that captures demand for EV charging. This use case will be demonstrated in our upcoming release webinar on July 10th.

Let’s walk through how to use Raster Inference for this use case.

First, we run predictions on rasters to find solar farms. The following code block that calls RS_SEGMENT shows how easy this is.

CREATE OR REPLACE TEMP VIEW segment_fields AS (
    SELECT
        outdb_raster,
        RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
    FROM
    az_high_demand_with_scene
)

The confidence_array column produced from RS_SEGMENT can be assigned the same geospatial coordinates as the raster input and converted to a vector that can be spatially joined and processed with WherobotsDB using RS_SEGMENT_TO_GEOMS. We select a confidence threshold of .65 so that we only georeference high confidence detections.

WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(outdb_raster, confidence_array, array(1), class_map, 0.65) result
        FROM predictions_df
    )
    SELECT result.* FROM t
+----------+--------------------+--------------------+
|     class|avg_confidence_score|            geometry|
+----------+--------------------+--------------------+
|Solar Farm|  0.7205783606825462|MULTIPOLYGON (((-...|
|Solar Farm|  0.7273308333550763|MULTIPOLYGON (((-...|
|Solar Farm|  0.7301468510823231|MULTIPOLYGON (((-...|
|Solar Farm|  0.7180177244988899|MULTIPOLYGON (((-...|
|Solar Farm|   0.728077805771141|MULTIPOLYGON (((-...|
|Solar Farm|     0.7264981572898|MULTIPOLYGON (((-...|
|Solar Farm|  0.7044100126912517|MULTIPOLYGON (((-...|
|Solar Farm|  0.7137283466756343|MULTIPOLYGON (((-...|
+----------+--------------------+--------------------+

This allows us to integrate the vectorized model predictions with other spatial datasets and easily visualize the results with SedonaKepler.

https://wherobots.com/wp-content/uploads/2024/06/solar_farm_detection-1-1024x398.png

Here Raster Inference runs on a 85 GiB dataset with 2,200 raster scenes for Arizona. Using a Sedona (tiny) runtime, Raster Inference completed in 430 seconds, predicting solar farms for all low cloud cover satellite images for the state of Arizona for the month of October. If we scale up our runtime to a San Francisco (small) runtime, the inference speed nearly doubles. In general, average bytes processed per second by Wherobots increases as datasets scale in size because startup costs are amortized over time. Processing speed also increases as runtimes scale in size.

Inference time (seconds) Runtime Size
430 Sedona
246 San Francisco

We use predictions from the output of Raster Inference to derive insights about which zip codes have the most solar farms, as shown below. This statement joins predicted solar farms with zip codes by location, then ranks zip codes by the pre-computed solar farm area within each zip code. We skipped this step for brevity but you can see it and others in the notebook example.

az_solar_zip_codes = sedona.sql("""
SELECT solar_area, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM predictions_polys JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, predictions_polys.geometry)
GROUP BY ZCTA5CE10
ORDER BY solar_area DESC
""")

https://wherobots.com/wp-content/uploads/2024/06/final_analysis.png

These predictions are made possible by SATLAS, a family of machine learning models released with Apache 2.0 licensing from Allen AI. The solar model demonstrated above was derived from the SATLAS foundational model. This foundational model can be used as a building block to create models to address specific detection challenges like solar farm detection. Additionally, there are many other open source machine learning models available for deriving insights from satellite imagery, many of which are provided by the TorchGeo project. We are just beginning to explore what these models can achieve for planetary-scale monitoring.

If you have a specific model you would like to see made available, please contact us to let us know.

For detailed instructions on using Raster Inference, please refer to our example Jupyter notebooks in the documentation.

https://wherobots.com/wp-content/uploads/2024/06/Screenshot_2024-06-08_at_2.11.07_PM-1024x683.png

Here are some links to get you started:
https://docs.wherobots.com/latest/tutorials/wherobotsai/wherobots-inference/segmentation/

https://docs.wherobots.com/latest/api/wherobots-inference/pythondoc/inference/sql_functions/

Getting Started

Getting started with WherobotsAI Raster Inference is easy. We’ve provided three models in Wherobots Cloud that can be used with our GPU optimized runtimes. Sign up for an account on Wherobots Cloud, send us a note to access the professional tier, start a GPU runtime, and you can run our example Jupyter notebooks to analyze satellite imagery in SQL or Python.

Stay tuned for updates on improvements to Raster Inference that will make it possible to run more models, including your own custom models. We’re excited to hear what models you’d like us to support, or the integrations you need to make running your own models even easier with Raster Inference. We can’t wait for your feedback and to see what you’ll create!

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:

The Spatial SQL API brings the performance of WherobotsDB to your favorite data applications

Since its launch last fall, Wherobots has raised the bar for cloud-native geospatial data analytics, offering the first and only platform for working with vector and raster geospatial data together at a planetary scale. Wherobots delivers a significant breadth of geospatial analytics capabilities, built around a cloud-native data lakehouse architecture and query engine that delivers up to 60x better performance than incumbent solutions. Accessible through the powerful notebook experience data scientists and data engineers know and love, Wherobots Cloud is the most comprehensive, approachable, and fully-managed serverless offering for enabling spatial intelligence at scale.

Today, we’re announcing the Wherobots Spatial SQL API, powered by Apache Sedona, to bring the performance of WherobotsDB to your favorite data applications. This opens the door to a world of direct-SQL integrations with Wherobots Cloud, bringing a serverless cloud engine that’s optimized for spatial workloads at any scale into your spatial ETL pipelines and applications, and taking your users and engineers closer to your data and spatial insights.

Register for our release webinar on July 10th here: https://bit.ly/3yFlFYk

Developers love Wherobots because compute is abstracted and managed by Wherobots Cloud. Because it can run at a planetary scale, Wherobots streamlines development and reduces time to insight. It runs on a data lake architecture, so data doesn’t need to be copied into a proprietary storage system, and integrates into familiar development tools and interfaces for exploratory analytics and orchestrating production spatial ETL pipelines.

Utilize Apache Airflow or SQL IDEs with WherobotsDB via the Spatial SQL API

Wherobots Cloud and the Wherobots Spatial SQL API are powered by WherobotsDB, with Apache Sedona at its core: a distributed computation engine that can horizontally scale to handle computation and analytics on any dataset. Wherobots Cloud automatically manages the infrastructure and compute resources of WherobotsDB to serve your use case based on how much computation power you need.

Behind the scenes, your Wherobots Cloud “runtime” defines the amount of compute resources allocated and the configuration of the software environment that executes your workload (in particular for AI/ML use cases, or if your ETL or analytics workflow depends on 1st or 3rd party libraries).

Our always-free Community Edition gives access to a modest “Sedona” runtime for working with small-scale datasets. Our Professional Edition unlocks access to much larger runtimes, up to our “Tokyo” runtime capable of working on planetary-scale datasets, and GPU-accelerated options for your WherobotsAI workloads.

With the release of the Wherobots Spatial SQL API and its client SDKs, you can bring WherobotsDB, the ease-of-use, and the expressiveness of SQL to your Apache Airflow spatial ETL pipelines, your applications, and soon to tools like Tableau, Superset, and other 3rd party systems and applications that support JDBC.

Our customers love applying the performance and scalability of WherobotsDB to their data preparation workflows and their compute-intensive data processing applications.

Use cases include

  • Preparation of nationwide and planetary-scale datasets for their users and customers
  • Processing hundreds of millions of mobility data records every day
  • Creating and analyzing spatial datasets in support of their real estate strategy and decision-making.

Now customers have the option to integrate new tools with Wherobots for orchestration and development of spatial insights using the Spatial SQL API.

How to get started with the Spatial SQL API

By establishing a connection to the Wherobots Spatial SQL API, a SQL session is started backed by your selected WherobotsDB runtime (or a “Sedona” by default but you can specify larger runtimes if you need more horsepower). Queries submitted through this connection are securely executed against your runtime, with compute fully managed by Wherobots.

We provide client SDKs in Java and in Python to easily connect and interact with WherobotsDB through the Spatial SQL API, as well as an Airflow Provider to build your spatial ETL DAGs; all of which are open-source and available on package registries, as well as on Wherobots’ GitHub page.

Using the Wherobots SQL Driver in Python

Wherobots provides an open-source Python library that exposes a DB-API 2.0 compatible interface for connecting to WherobotsDB. To build a Python application around the Wherobots DB-API driver, add the wherobots-python-dbapi library to your project’s dependencies:

$ poetry add wherobots-python-dbapi

Or directly install the package on your system with pip:

$ pip install wherobots-python-dbapi

From your Python application, establish a connection with wherobots.db.connect() and use cursors to execute your SQL queries and use their results:

import logging

from wherobots.db import connect
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime

# Optionally, setup logging to get information about the driver's
# activity.
logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format="%(asctime)s.%(msecs)03d %(levelname)s %(name)20s: %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

# Get your API key, or securely read it from a local file.
api_key = '...'

with connect(
    host="api.cloud.wherobots.com",
    api_key=get_secret(),
  runtime=Runtime.SEDONA,
  region=Region.AWS_US_WEST_2) as conn:
        cur = conn.cursor()
        sql = """
          SELECT
              id,
              names['primary'] AS name,
              geometry,
              population
          FROM
              wherobots_open_data.overture_2024_02_15.admins_locality
          WHERE localityType = 'country'
          SORT BY population DESC
          LIMIT 10
      """
        cur.execute(sql)
        results = cur.fetchall()
      results.show()

For more information and future releases, see https://github.com/wherobots/wherobots-python-dbapi-driver on GitHub.

Using the Apache Airflow provider

Wherobots provides an open-source provider for Apache Airflow, defining an Airflow operator for executing SQL queries directly on WherobotsDB. With this new capability, you can integrate your spatial analytics queries, data preparation or data processing steps into new or existing Airflow workflow DAGs.

To build or extend your Airflow DAG using the WherobotsSqlOperator , add the airflow-providers-wherobots dependency to your project:

$ poetry add airflow-providers-wherobots

Define your connection to Wherobots; by default the Wherobots operators use the wherobots_default connection ID:

$ airflow connections add "wherobots_default" \
    --conn-type "wherobots" \
    --conn-host "api.cloud.wherobots.com" \
    --conn-password "$(< api.key)"

Instantiate the WherobotsSqlOperator and with your choice of runtime and your SQL query, and integrate it into your Airflow DAG definition:

from wherobots.db.runtime import Runtime
import airflow_providers_wherobots.operators.sql.WherobotsSqlOperator

...

select = WherobotsSqlOperator(
  runtime=Runtime.SEDONA,
  sql="""
          SELECT
              id,
              names['primary'] AS name,
              geometry,
              population
          FROM
              wherobots_open_data.overture_2024_02_15.admins_locality
          WHERE localityType = 'country'
          SORT BY population DESC
          LIMIT 10
      """
)
# select.execute() or integrate into your Airflow DAG definition

apache airflow spatial sql api
For more information and future releases, see https://github.com/wherobots/airflow-providers-wherobots on GitHub.

Using the Wherobots SQL Driver in Java

Wherobots provides an open-source Java library that implements a JDBC (Type 4) driver for connecting to WherobotsDB. To start building Java applications around the Wherobots JDBC driver, add the following line to your build.gradle file’s dependency section:

implementation "com.wherobots:wherobots-jdbc-driver"

In your application, you only need to work with Java’s JDBC APIs from the java.sql package:

import com.wherobots.db.Region;
import com.wherobots.db.Runtime;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

// Get your API key, or securely read it from a local file.
String apiKey = "...";

Properties props = new Properties();
props.setProperty("apiKey", apiKey);
props.setProperty("runtime", Runtime.SEDONA);
props.setProperty("region", Region.AWS_US_WEST_2);

try (Connection conn = DriverManager.getConnection("jdbc:wherobots://api.cloud.wherobots.com", props)) {
    String sql = """
        SELECT
            id,
            names['primary'] AS name,
            geometry,
            population
        FROM
            wherobots_open_data.overture_2024_02_15.admins_locality
        WHERE localityType = 'country'
        SORT BY population DESC
        LIMIT 10
    """;
  Statement stmt = conn.createStatement();
  try (ResultSet rs = stmt.executeQuery(sql)) {
    while (rs.next()) {
      System.out.printf("%s: %s %f %s\n",
        rs.getString("id"),
        rs.getString("name"),
        rs.getDouble("population"),
        rs.getString("geometry"));
    }
  }
}

For more information and future releases, see https://github.com/wherobots/wherobots-jdbc-driver on GitHub.

Conclusion

The Wherobots Spatial SQL API takes Wherobots’ vision of hassle-free, scalable geospatial data analytics & AI one step further by making it the easiest way to run your Spatial SQL queries in the cloud. Paired with Wherobots and Apache Sedona’s comprehensive support for working with all geospatial data at any scale and in any format, and with Wherobots AI’s inference features available directly from SQL, the Wherobots Spatial SQL API is also the most flexible and the most capable platform for getting the most out of your data.

Wherobots vision

We exist because creating spatial intelligence at-scale is hard. Our contributions to Apache Sedona, leadership in the open geospatial domain, and investments in Wherobots Cloud have, and will make it easier. Users of Apache Sedona, Wherobots customers, and ultimately any AI application will be enabled to support better decisions about our physical and virtual worlds. They will be able to create solutions to improve these worlds that were otherwise infeasible or too costly to build. And the solutions developed will have a positive impact on society, business, and earth — at a planetary scale.

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:


Unlocking the Spatial Frontier: The Evolution and Potential of spatial technology in Apple Vision Pro and Augmented Reality Apps

The evolution of Augmented Reality (AR) from the realm of science fiction to tangible, practical applications like Augmented Driving, Pokemon Go, and Meta Quest marked a significant shift in how we interact with technology and perceive our surroundings. The recent introduction of Apple Vision Pro underscores this transition, bringing AR closer to mainstream adoption. While the ultimate fate of devices like Apple Vision Pro or Meta Quest remains uncertain, their technological capabilities are undeniably impressive.

One of the key components of Apple Vision Pro is what Apple refers to as "Spatial Computing." While the term itself isn’t novel, with decades of research exploring the utilization of spatial context in computing, Apple’s interpretation focuses primarily on integrating spatial environments into virtual computing environments and vice versa. This approach builds upon established research in spatial object localization, representation, and spatial query processing. Moreover, it opens doors to leveraging spatial analytics, potentially offering insights and functionalities previously unimaginable.

Despite its roots in earlier research and literature like "Spatial Computing" by Shashi Shekhar and Pamela Vold, Apple’s redefinition underscores a shift in focus towards immersive spatial experiences within computing environments. By leveraging advancements in technology and innovative approaches, Apple and other companies are pushing the boundaries of what’s possible in AR, paving the way for exciting new applications and experiences. This article highlights the technical challenged Apple had to overcome to achieve such milestone and also lay the groundwork for future improvements.

Spatial object localization and Presentation in Apple Vision Pro

Devices like Apple Vision Pro, to work properly, had to first solve challenges in object localization, requiring systems to not only determine the user’s location but also locate objects within the camera’s line of sight. Existing outdoor and indoor localization technologies provide a foundation, but traditional methods face limitations in augmented reality contexts. Apple Vision Pro solved challenges such as varying object positions due to camera angle and real-time localization for moving objects. It also did a great job integrating advanced technologies including image processing, artificial intelligence, and deep learning. Promising research directions involve leveraging semantic embeddings, depth cameras, and trajectory-based map matching algorithms to make sure those devices are usable in outdoor environments. By combining these approaches, the aim is to achieve real-time, high-accuracy object localization across different environments while minimizing device power consumption.

The apple vision pro does a fantastic job presenting virtual data alongside real-world objects captured by the device’s camera. Unlike traditional user interfaces, augmented reality interfaces  must carefully integrate augmented reality data to avoid distorting the user’s view and causing potential safety hazards (while driving or crossing the streets). Apple vision pro still does not completely solve the problem, but there is room for improvement. I believe a big next step for those devices to succeed is to address the challenge of maintaining visual clarity and relevance of augmented data, as well as opportunities to draw from existing research in virtual reality applications and location-aware recommendation techniques. For example, one direction may explore the potential of presenting augmented reality spatial objects as audio messages to users. This alternative modality offers advantages in scenarios where visual attention is already heavily taxed, such as driving. However, an essential aspect of this approach is the ranking of augmented spatial objects to determine their size and prominence, ensuring optimal user engagement while minimizing distractions.

The role of spatial query processing in Apple Vision Pro

Similar to the iPhone, Apple Vision Pro also comes equipped with a range of apps designed to leverage its capabilities. These apps utilize the mixed reality environment by issuing queries to retrieve spatial objects and presenting them within the immersive experience facilitated by Vision Pro. For example, a navigation app using Apple Vision Pro might issue queries to fetch spatial objects such as points of interest, landmarks, or navigation markers. These objects would then be presented within the user’s field of view, overlaying relevant information onto the physical world through the device’s display. Similarly, an education app could retrieve spatial objects representing interactive learning materials or virtual models, enriching the user’s understanding of their surroundings.

To achieve this, the apps would communicate with the mixed reality environment, likely through APIs or SDKs provided by Apple’s developer tools. These interfaces would allow the apps to issue spatial queries to the environment, specifying parameters such as location, distance, and relevance criteria. The mixed reality environment would then return relevant spatial objects based on these queries, which the apps can seamlessly integrate into the user’s immersive experience. By leveraging the capabilities of Apple Vision Pro and interacting with the mixed reality environment, these apps can provide users with rich, context-aware experiences that enhance their understanding and interaction with the world around them. Whether for navigation, education, gaming, or other purposes, the ability to issue queries and retrieve spatial objects is fundamental to unlocking the full potential of Vision Pro’s immersive capabilities.

However, the classic rectangular or circular range query processing techniques may need to be redefined to accommodate the camera range and line of sight. While the camera view can still be formulated using a rectangular range query, this approach may not be very efficient, as not every spatial object within the camera range needs to be retrieved. This inefficiency arises because the more augmented spatial objects stitched to the camera scene, the more distorted the user’s view of the physical world becomes. Furthermore, as the camera’s line of sight changes, the system issues a new range query to the database. This may hinder the real-time constraint imposed by Apple Vision Pro applications.

To optimize the performance of Apple Vision Pro applications, it’s essential to redefine the spatial range query to accurately account for the camera range and line of sight. This could involve implementing algorithms that dynamically adjust the spatial query based on the camera’s current view and line of sight. By doing so, only the relevant augmented spatial objects within the camera’s field of view need to be retrieved, minimizing distortion and ensuring real-time performance for Apple Vision Pro applications.

The role of spatial data analytics in Apple Vision Pro

With the proliferation of applications for the Apple Vision Pro, there will be a surge in the accumulation of spatial data by these apps. This data will encapsulate users’ engagements within both the physical and virtual environments. By processing and analyzing this data, a deeper comprehension of user behavior can be attained, thereby facilitating the optimization of applications to better serve their user base. For instance, consider an apple vision pro app for sightseeing. Here’s how the spatial analytics process might work:

  • Data Collection: The site-seeing app collects spatial data from users as they navigate through the city using Apple Vision Pro. This data could include GPS coordinates, timestamps, images, and possibly other contextual information.
  • Data Processing: The collected spatial data is processed to extract relevant information such as user trajectories, points of interest visited, time spent at each location, and any interactions within the virtual environment overlaid on the physical world.
  • Analysis: Once the data is processed, various analytical techniques can be applied to gain insights. This might involve clustering similar user trajectories to identify popular routes, analyzing dwell times to determine the most engaging attractions, or detecting patterns in user interactions within virtual environments.
  • Insights Generation: Based on the analysis, insights are generated about user behavior and preferences. For example, the app developers might discover that a certain landmark is highly popular among users, or that users tend to spend more time in areas with interactive virtual elements.
  • Application Enhancement: Finally, these insights are used to enhance the site-seeing app. This could involve improving recommendations to users based on their preferences and behavior, optimizing the layout of virtual overlays to increase engagement, or developing new features to better cater to user needs.

By continuously collecting, processing, and analyzing spatial data, the site-seeing app can iteratively improve and evolve, ultimately providing a more personalized and engaging experience for its users. Additionally, users may benefit from discovering new attractions and experiences tailored to their interests, while also contributing to the collective knowledge base that fuels these improvements.

A hiking app on Apple Vision pro could collect spatial data representing users’ interactions with the physical environment while hiking, such as the trails they take, points of interest they stop at, and the duration of their hikes. Additionally, it could also capture interactions with virtual elements overlaid on the real-world environment, such as augmented reality trail markers or informational overlays.

By processing and analyzing this data, the hiking app can gain valuable insights into user behavior. For example, it could identify popular hiking routes, points of interest along those routes, and common user preferences or patterns. This information can then be used to improve the app’s functionality and tailor it to better serve its user base.

For instance, the app could suggest personalized hiking routes based on a user’s past behavior and preferences. It could also provide real-time notifications about points of interest or hazards along the trail, based on data collected from previous users’ interactions. Additionally, the app could use machine learning algorithms to predict future user behavior and offer proactive suggestions or recommendations.

To enable apps to leverage spatial analytics effectively, they require a scalable and user-friendly spatial data analytics platform. This platform should be capable of handling massive and intricate spatial data collected from AR devices, allowing users to execute spatial analytics queries efficiently without the need to optimize compute resources for such workloads. This aligns perfectly with our mission at Wherobots. We envision every Apple Vision Pro app utilizing Wherobots as their all-in-one cloud platform for running spatial data processing and analytics tasks.. By fully leveraging spatial analytics, Apple vision pro and its app ecosystem could unlock a host of new possibilities for augmented reality experiences:

  • Personalized Recommendations: Spatial analytics could enable Apple Vision Pro to analyze users’ past interactions and preferences to offer highly personalized recommendations. For example, the device could suggest nearby attractions based on a user’s interests or recommend routes tailored to their preferences.
  • Predictive Capabilities: By analyzing spatial data in real-time, Apple Vision Pro could anticipate users’ needs and actions, providing proactive assistance and guidance. For instance, the device could predict congestion or obstacles along a chosen route and suggest alternative paths to optimize the user’s journey.
  • Enhanced Immersion: Spatial analytics could enrich augmented reality experiences by dynamically adapting virtual content based on the user’s environment and behavior. This could include adjusting the placement of virtual objects to align with real-world features or modifying virtual interactions to better suit the user’s context.
  • Insightful Analytics: Spatial analytics could provide valuable insights into user behavior and spatial patterns, enabling developers to optimize their applications and experiences. For example, developers could analyze heatmaps of user activity to identify popular areas or assess the effectiveness of virtual overlays in guiding users.
  • Advanced Navigation: Spatial analytics could power advanced navigation features, such as indoor positioning and navigation assistance. Apple Vision Pro could leverage spatial data to provide precise directions within complex indoor environments, helping users navigate malls, airports, and other large venues with ease.

By harnessing the power of spatial analytics, Apple Vision Pro has the potential to redefine how we interact with augmented reality and transform numerous industries, from retail and tourism to education and healthcare. As the technology continues to evolve, we can expect even more innovative applications and experiences to emerge, further blurring the lines between the physical and virtual worlds.

To wrap up:

Overall, Apple Vision Pro represents a significant advancement in the field of spatial computing, leveraging decades of research and development to seamlessly integrate virtual and physical environments. As the technology continues to evolve and mature, it holds the promise of revolutionizing various industries and everyday experiences, from gaming and entertainment to navigation and productivity. We will also see advancements in GPUs (Graphics Processing Units) play a crucial role in running spatial computing / AI tasks efficiently with reduced energy consumption. While Apple Vision Pro has yet to fully leverage spatial analytics, it holds significant potential for analyzing spatial data collected during user interactions. Spatial analytics involves extracting meaningful insights and patterns from spatial data, such as user trajectories, spatial relationships between objects, and spatial distributions of activity. By applying spatial analytics, Apple could further enhance the functionality and intelligence of its augmented reality experiences, enabling personalized recommendations, predictive capabilities, and more immersive interactions.