Unlocking the Spatial Frontier: The Evolution and Potential of spatial technology in Apple Vision Pro and Augmented Reality Apps

The evolution of Augmented Reality (AR) from the realm of science fiction to tangible, practical applications like Augmented Driving, Pokemon Go, and Meta Quest marked a significant shift in how we interact with technology and perceive our surroundings. The recent introduction of Apple Vision Pro underscores this transition, bringing AR closer to mainstream adoption. While the ultimate fate of devices like Apple Vision Pro or Meta Quest remains uncertain, their technological capabilities are undeniably impressive.

One of the key components of Apple Vision Pro is what Apple refers to as "Spatial Computing." While the term itself isn’t novel, with decades of research exploring the utilization of spatial context in computing, Apple’s interpretation focuses primarily on integrating spatial environments into virtual computing environments and vice versa. This approach builds upon established research in spatial object localization, representation, and spatial query processing. Moreover, it opens doors to leveraging spatial analytics, potentially offering insights and functionalities previously unimaginable.

Despite its roots in earlier research and literature like "Spatial Computing" by Shashi Shekhar and Pamela Vold, Apple’s redefinition underscores a shift in focus towards immersive spatial experiences within computing environments. By leveraging advancements in technology and innovative approaches, Apple and other companies are pushing the boundaries of what’s possible in AR, paving the way for exciting new applications and experiences. This article highlights the technical challenged Apple had to overcome to achieve such milestone and also lay the groundwork for future improvements.

Spatial object localization and Presentation in Apple Vision Pro

Devices like Apple Vision Pro, to work properly, had to first solve challenges in object localization, requiring systems to not only determine the user’s location but also locate objects within the camera’s line of sight. Existing outdoor and indoor localization technologies provide a foundation, but traditional methods face limitations in augmented reality contexts. Apple Vision Pro solved challenges such as varying object positions due to camera angle and real-time localization for moving objects. It also did a great job integrating advanced technologies including image processing, artificial intelligence, and deep learning. Promising research directions involve leveraging semantic embeddings, depth cameras, and trajectory-based map matching algorithms to make sure those devices are usable in outdoor environments. By combining these approaches, the aim is to achieve real-time, high-accuracy object localization across different environments while minimizing device power consumption.

The apple vision pro does a fantastic job presenting virtual data alongside real-world objects captured by the device’s camera. Unlike traditional user interfaces, augmented reality interfaces  must carefully integrate augmented reality data to avoid distorting the user’s view and causing potential safety hazards (while driving or crossing the streets). Apple vision pro still does not completely solve the problem, but there is room for improvement. I believe a big next step for those devices to succeed is to address the challenge of maintaining visual clarity and relevance of augmented data, as well as opportunities to draw from existing research in virtual reality applications and location-aware recommendation techniques. For example, one direction may explore the potential of presenting augmented reality spatial objects as audio messages to users. This alternative modality offers advantages in scenarios where visual attention is already heavily taxed, such as driving. However, an essential aspect of this approach is the ranking of augmented spatial objects to determine their size and prominence, ensuring optimal user engagement while minimizing distractions.

The role of spatial query processing in Apple Vision Pro

Similar to the iPhone, Apple Vision Pro also comes equipped with a range of apps designed to leverage its capabilities. These apps utilize the mixed reality environment by issuing queries to retrieve spatial objects and presenting them within the immersive experience facilitated by Vision Pro. For example, a navigation app using Apple Vision Pro might issue queries to fetch spatial objects such as points of interest, landmarks, or navigation markers. These objects would then be presented within the user’s field of view, overlaying relevant information onto the physical world through the device’s display. Similarly, an education app could retrieve spatial objects representing interactive learning materials or virtual models, enriching the user’s understanding of their surroundings.

To achieve this, the apps would communicate with the mixed reality environment, likely through APIs or SDKs provided by Apple’s developer tools. These interfaces would allow the apps to issue spatial queries to the environment, specifying parameters such as location, distance, and relevance criteria. The mixed reality environment would then return relevant spatial objects based on these queries, which the apps can seamlessly integrate into the user’s immersive experience. By leveraging the capabilities of Apple Vision Pro and interacting with the mixed reality environment, these apps can provide users with rich, context-aware experiences that enhance their understanding and interaction with the world around them. Whether for navigation, education, gaming, or other purposes, the ability to issue queries and retrieve spatial objects is fundamental to unlocking the full potential of Vision Pro’s immersive capabilities.

However, the classic rectangular or circular range query processing techniques may need to be redefined to accommodate the camera range and line of sight. While the camera view can still be formulated using a rectangular range query, this approach may not be very efficient, as not every spatial object within the camera range needs to be retrieved. This inefficiency arises because the more augmented spatial objects stitched to the camera scene, the more distorted the user’s view of the physical world becomes. Furthermore, as the camera’s line of sight changes, the system issues a new range query to the database. This may hinder the real-time constraint imposed by Apple Vision Pro applications.

To optimize the performance of Apple Vision Pro applications, it’s essential to redefine the spatial range query to accurately account for the camera range and line of sight. This could involve implementing algorithms that dynamically adjust the spatial query based on the camera’s current view and line of sight. By doing so, only the relevant augmented spatial objects within the camera’s field of view need to be retrieved, minimizing distortion and ensuring real-time performance for Apple Vision Pro applications.

The role of spatial data analytics in Apple Vision Pro

With the proliferation of applications for the Apple Vision Pro, there will be a surge in the accumulation of spatial data by these apps. This data will encapsulate users’ engagements within both the physical and virtual environments. By processing and analyzing this data, a deeper comprehension of user behavior can be attained, thereby facilitating the optimization of applications to better serve their user base. For instance, consider an apple vision pro app for sightseeing. Here’s how the spatial analytics process might work:

  • Data Collection: The site-seeing app collects spatial data from users as they navigate through the city using Apple Vision Pro. This data could include GPS coordinates, timestamps, images, and possibly other contextual information.
  • Data Processing: The collected spatial data is processed to extract relevant information such as user trajectories, points of interest visited, time spent at each location, and any interactions within the virtual environment overlaid on the physical world.
  • Analysis: Once the data is processed, various analytical techniques can be applied to gain insights. This might involve clustering similar user trajectories to identify popular routes, analyzing dwell times to determine the most engaging attractions, or detecting patterns in user interactions within virtual environments.
  • Insights Generation: Based on the analysis, insights are generated about user behavior and preferences. For example, the app developers might discover that a certain landmark is highly popular among users, or that users tend to spend more time in areas with interactive virtual elements.
  • Application Enhancement: Finally, these insights are used to enhance the site-seeing app. This could involve improving recommendations to users based on their preferences and behavior, optimizing the layout of virtual overlays to increase engagement, or developing new features to better cater to user needs.

By continuously collecting, processing, and analyzing spatial data, the site-seeing app can iteratively improve and evolve, ultimately providing a more personalized and engaging experience for its users. Additionally, users may benefit from discovering new attractions and experiences tailored to their interests, while also contributing to the collective knowledge base that fuels these improvements.

A hiking app on Apple Vision pro could collect spatial data representing users’ interactions with the physical environment while hiking, such as the trails they take, points of interest they stop at, and the duration of their hikes. Additionally, it could also capture interactions with virtual elements overlaid on the real-world environment, such as augmented reality trail markers or informational overlays.

By processing and analyzing this data, the hiking app can gain valuable insights into user behavior. For example, it could identify popular hiking routes, points of interest along those routes, and common user preferences or patterns. This information can then be used to improve the app’s functionality and tailor it to better serve its user base.

For instance, the app could suggest personalized hiking routes based on a user’s past behavior and preferences. It could also provide real-time notifications about points of interest or hazards along the trail, based on data collected from previous users’ interactions. Additionally, the app could use machine learning algorithms to predict future user behavior and offer proactive suggestions or recommendations.

To enable apps to leverage spatial analytics effectively, they require a scalable and user-friendly spatial data analytics platform. This platform should be capable of handling massive and intricate spatial data collected from AR devices, allowing users to execute spatial analytics queries efficiently without the need to optimize compute resources for such workloads. This aligns perfectly with our mission at Wherobots. We envision every Apple Vision Pro app utilizing Wherobots as their all-in-one cloud platform for running spatial data processing and analytics tasks.. By fully leveraging spatial analytics, Apple vision pro and its app ecosystem could unlock a host of new possibilities for augmented reality experiences:

  • Personalized Recommendations: Spatial analytics could enable Apple Vision Pro to analyze users’ past interactions and preferences to offer highly personalized recommendations. For example, the device could suggest nearby attractions based on a user’s interests or recommend routes tailored to their preferences.
  • Predictive Capabilities: By analyzing spatial data in real-time, Apple Vision Pro could anticipate users’ needs and actions, providing proactive assistance and guidance. For instance, the device could predict congestion or obstacles along a chosen route and suggest alternative paths to optimize the user’s journey.
  • Enhanced Immersion: Spatial analytics could enrich augmented reality experiences by dynamically adapting virtual content based on the user’s environment and behavior. This could include adjusting the placement of virtual objects to align with real-world features or modifying virtual interactions to better suit the user’s context.
  • Insightful Analytics: Spatial analytics could provide valuable insights into user behavior and spatial patterns, enabling developers to optimize their applications and experiences. For example, developers could analyze heatmaps of user activity to identify popular areas or assess the effectiveness of virtual overlays in guiding users.
  • Advanced Navigation: Spatial analytics could power advanced navigation features, such as indoor positioning and navigation assistance. Apple Vision Pro could leverage spatial data to provide precise directions within complex indoor environments, helping users navigate malls, airports, and other large venues with ease.

By harnessing the power of spatial analytics, Apple Vision Pro has the potential to redefine how we interact with augmented reality and transform numerous industries, from retail and tourism to education and healthcare. As the technology continues to evolve, we can expect even more innovative applications and experiences to emerge, further blurring the lines between the physical and virtual worlds.

To wrap up:

Overall, Apple Vision Pro represents a significant advancement in the field of spatial computing, leveraging decades of research and development to seamlessly integrate virtual and physical environments. As the technology continues to evolve and mature, it holds the promise of revolutionizing various industries and everyday experiences, from gaming and entertainment to navigation and productivity. We will also see advancements in GPUs (Graphics Processing Units) play a crucial role in running spatial computing / AI tasks efficiently with reduced energy consumption. While Apple Vision Pro has yet to fully leverage spatial analytics, it holds significant potential for analyzing spatial data collected during user interactions. Spatial analytics involves extracting meaningful insights and patterns from spatial data, such as user trajectories, spatial relationships between objects, and spatial distributions of activity. By applying spatial analytics, Apple could further enhance the functionality and intelligence of its augmented reality experiences, enabling personalized recommendations, predictive capabilities, and more immersive interactions.

Wherobots: 2023 Year in Review!

2023 has been an exceptional year for Wherobots, marked by a series of significant milestones. In Q3, we proudly announced a $5.5 million seed funding round, setting a strong financial foundation for our future endeavors. Our team experienced a remarkable 300% growth compared to the previous year, a testament to our expanding capabilities and reach. In Q4, we achieved a major milestone with the launch of our commercial product, marking our entry into a new phase of business operations. Additionally, the Apache Sedona project, a cornerstone of our technology, saw an impressive 130% YoY growth in the number of downloads, reflecting the vibrant and growing community engagement around our platform. Below is a summary of what we achieved on each front:

2023

Growing the Team

The most remarkable achievement for Wherobots in 2023 was its extraordinary team expansion, which saw an impressive 300% growth rate. This expansion significantly enhanced various key areas, including the executive leadership, engineering, marketing, and developer relations teams. To highlight a few notable instances:

Maxime Petazzoni has been a standout addition to the executive leadership team, assuming the role of Head of Engineering. Prior to joining Wherobots, Maxime made significant contributions at SignalFx and Splunk, where he led multiple engineering organizations with notable success. William Lyon is another high-caliber professional who recently joined Wherobots, spearheading our developer relations efforts. William’s experience at Neo4j in a similar role was instrumental in fostering the growth and widespread adoption of Neo4j’s technologies.

We have strategically expanded our engineering team by bringing on board specialists in key areas such as spatial technology, database engineering, cloud data infrastructure, and artificial intelligence. The team’s expertise is enriched by their previous experience with leading industry giants. Our team members boast impressive pedigrees, having previously been part of renowned organizations like Amazon, Google, Microsoft, Apple, Splunk, LinkedIn, Airbnb, Grab, Tencent, among others. This diverse array of top-tier industry experience positions us uniquely in the tech landscape and underscores our commitment to excellence in our field.

Please watch the following video to learn about our team and culture:
https://youtu.be/sSR-jheAPwM?si=_GYHvRmH4tpR_MFY

Commercial Product Launch

In the final quarter of 2023, Wherobots achieved a significant milestone with the launch of its commercial offering: a cutting-edge cloud data infrastructure. This innovative platform empowers organizations to leverage Spatial Analytics and AI for extracting valuable insights from their data. The centerpiece of this product is Wherobots Cloud, featuring SedonaDB as a fully-managed cloud service, grounded in the robust architecture of Apache Sedona. SedonaDB is designed for high scalability and cost efficiency, enabling users to conduct advanced spatial analytics on enterprise data of any scale and regardless of its location – be it in a data lake, data warehouse, or a conventional database.

Since its recent launch just a few weeks ago, Wherobots has witnessed a remarkable uptake of its product, evidenced by many users signing up. This early success is further underscored by the processing of more than 3 billion spatial records, a testament to the platform’s robust capability. Additionally, the platform has been utilized to process more than 2000 spatial analytics tasks, indicating a strong and growing demand for its innovative solutions. That is only scratching the surface since Wherobots is designed to handle hundreds of petabytes and millions of spatial analytics tasks on a more regular basis.

Check the product launch details here: https://wherobots.com/sedonadb-the-cloud-native-spatial-analytics-database-platform/

Raising $5.5M in Seed

In a significant financial milestone achieved in 2023, Wherobots proudly disclosed the successful acquisition of $5.5 million in funding through an initial seed investment round. This round saw active participation from influential industry figures, including Chris Rust and Peter Wagner, founding partners of Clear Ventures and Wing VC respectively. Both Rust and Wagner have taken on roles within the Wherobots board, marking a notable collaboration.

Check the funding announcement article on TechCrunch here: https://techcrunch.com/2023/06/13/wherobots-is-building-a-data-platform-to-treat-spatial-data-as-a-first-class-citizen/

Apache Sedona Growth

Over the past year, Apache Sedona has experienced remarkable growth. As of today, it boasts support for over a hundred spatial data processing functions through its distributed and optimized spatial SQL engine. This advancement has paralleled significant growth in the Wherobots community in 2023. A notable indicator of this growth is the surge in Sedona downloads, which recorded a 130% year-over-year increase compared to 2022. This surge has brought the total number of Apache Sedona downloads to an impressive 23 million, according to PyPI and Maven Central statistics. Additionally, the Apache Sedona contributor base expanded to 111, marking approximately 20% growth from the previous year. A striking testament to this vibrant community’s engagement is the nearly 1 million lines of code contributed to Sedona, a staggering 1500% increase in year-over-year growth compared to 2022.

Check Apache Sedona Github Repo Here: https://github.com/apache/sedona

Finally

As we reflect on the tremendous growth we experienced in 2023, we’re filled with enthusiasm for the year ahead. In 2024, we are not only expecting to see further team expansion and product adoption but also setting our sights on several ambitious corporate goals. We aim to enhance our technological capabilities, foster innovation in spatial analytics, and strengthen our market position through strategic partnerships and collaborations. Additionally, we’re committed to delivering even greater value to our customers, achieving new milestones in user satisfaction, and driving sustainable growth. On behalf of the entire Wherobots team, we extend our warmest wishes for a joyful holiday season and a prosperous New Year. Here’s to a 2024 filled with achievement, innovation, and success!

Havasu: A Table Format for Spatial Attributes in a Data Lake Architecture

Introduction

In the past decade, many organizations have been using BLOB storage (e.g., AWS S3) as a primary storage platform. These organizations collect tons of data and ingest it as files into S3 for its scalability, cost efficiency, and reliability. However, there has since been a need to interact with such data using SQL, which is familiar to developers, in a similar approach to querying relational databases. That led to the invention of open table formats such as Apache Iceberg, which enables users to perform wide-ranging database operations, including concurrent data updates, table metadata handling, time travel and versioning, on files stored in object stores such as S3 without the need to load such files in a relational database.

However, when dealing with spatial data existing open data formats such as Apache Iceberg, Hudi, and DeltaLake, do not provide native support for spatial attributes. This lack of native spatial support forces users to handle the spatial aspect of data at the application layer. This approach often cannot handle the intricacies and scale of spatial analytics and falls short in meeting the demands of analytical workloads leveraging spatial data. The significance of spatial analytics underscores the pressing need for its efficient management within the enterprise data stack, which includes managing spatial attributes in open table formats.

To remedy that, Havasu is an open table format that extends Apache Iceberg to support spatial data. Havasu introduces a range of pivotal features, including native support for manipulating and storing geometry and raster objects directly in data lake tables, and enables seamless querying and processing of spatial tables using Spatial SQL.

Key features of Havasu

ACID on Spatial Data

Havasu stands out for ensuring the ACID (Atomicity, Consistency, Isolation, and Durability) properties in spatial data transactions, a critical feature for reliable data management. This guarantees that each transaction is processed completely or not at all, maintaining data integrity even in complex environments. Furthermore, Havasu supports schema evolution, allowing for adaptable data structures without compromising existing data. This flexibility is key for efficiently managing and evolving spatial data over time, catering to the dynamic needs of spatial databases.

Native Spatial Support

Havasu supports geometry/geography and raster as primary data types, and seamlessly integrates with the compute layer. Users can easily read and write spatial data using Havasu, and process spatial data using any computation engine (e.g., WherobotsDB) as long as that computation engine implements a reader/writer for Havasu.

Efficiency

Computation engines (like WherobotsDB), when incorporating Havasu, can benefit from Havasu’s spatial filter push down support, which significantly accelerates spatial range queries. Havasu allows storing spatial data in object storages of the customer’s choice, and it decouples storage with computation, which makes storing vast amounts of spatial data very cost effective. Havasu also comes equipped with an efficient storage mode of raster data for parquet files, namely out-db storage, which enables high throughput of reading large amount of rasters.

Openness

Havasu is a table format with open specification. Havasu is based on Apache Iceberg, which has an open specification and was widely adopted by the big data ecosystem. The extension to the Apache Iceberg specification is also clearly specified so that any implementation adopting this extended specification is able to read and write Havasu tables. Customers can store Havasu tables in the storage of their choice, without being tightly coupled to one specific vendor or implementation.

Havasu in a Nutshell: Key Technical Insights

The open-source Havasu specification can be found in the Wherobots Havasu documentation. The Havasu table format extends the Iceberg table spec to support managing large spatial datasets as tables, in the following ways:

  • Primitive spatial data types and storage: the Havasu specification extends the Iceberg specification to support spatial data types.
  • Spatial statistics: extending the Iceberg manifest files to support spatial statistics.
  • Spatial filter push down and indexing: extending the Iceberg specifcation to support spatial filter pushdown and spatial indexing which greatly reduces the data retrieval overhead.

All other aspects of Iceberg spec are unchanged. For example, the Havasu specification does not change the fundamental organization of table files.

Primitive spatial data types and storage
Geometry

Geometry values consist of an optional spatial reference ID (abbreviated as SRID) and a geometry shape. The SRID is a 32-bit integer that identifies the coordinate system that the geometry shape is using. The interpretation of SRID is implementation dependent. For example, the SRID could be an EPSG code, or a code defined by a proprietary coordinate system. The geometry shape is one of the types defined by OGC Simple Features for SQL specification. The geometry shape can be stored in one of the following formats in underlying parquet files.

Encoding Parquet physical type Logical type Description
ewkb BINARY Extended Well-known binary (EWKB)
wkb BINARY Well-known binary (WKB)
wkt BINARY UTF8 Well-known text (WKT)
geojson BINARY UTF8 https://datatracker.ietf.org/doc/html/rfc7946

When the geometry column is at the root of the schema, and the geometry encoding is one of wkb and ewkb, the application can optionally write the GeoParquet metadata to the Parquet files. The GeoParquet metadata is defined by the GeoParquet specification.

Raster

A raster is one or more grids of cells. All the grids should have width rows and height columns. The grid cells are represented by the band field. The grids are geo-referenced using an affine transformation that maps the grid coordinates to world coordinates. The coordinate reference system (CRS) of the world coordinates is specified by the crs field. The CRS will be serialized as a WKT string when stored in data files.

Havasu supports persisting raster band values in two different ways:

  • in-db: The band values are stored in the same data file as the geo-referencing information. The band values are stored in the bands field of the raster value.
  • out-db: The band values are stored in files external to Havasu tables. The raster value stored in Havasu data file contains the geo-referencing information and URI of external raster files. The URI of external raster files are stored in the bands field of the raster value.
Spatial statistics

Havasu collects and records the spatial statistics of data files when writing data to the table. The spatial statistics includes the minimum bounding rectangle (MBR) of the geometries in the data file.

Geometry bounds

The bounds of geometry values should be derived using their minimum bounding rectangles (MBRs). The MBR of a geometry value is defined as the smallest rectangle that contains the geometry value. The SRID of geometry values are ignored when computing the MBRs. The MBRs of all geometry values in a data file should be unioned together as a single MBR, which is the MBR of the data file.

Raster bounds

Raster bounds are MBRs of rasters in WGS84 coordinate system. They are computed by transforming the envelope of the raster in its native coordinate system to WGS84. Raster bounds have a special rule for handling MBRs crossing the anti-meridian. Implementations of the Havasu specification should be able to handle MBRs crossing the anti-meridian correctly, otherwise spatial query optimizations will derive incomplete query results.

Spatial filter push down and indexing

Database engines can take advantage of the spatial statistics of data files to optimize the query execution plan. For example, if the query predicate is a spatial range query, the engine can use the spatial statistics to prune the data files that do not contain any data that satisfies the query predicate. This process is called spatial filter pushdown. How spatial query optimization is implemented in scan planning is implementation dependent. For example, in WherobotsDB, for a spatial range query ST_Within(geom, Q), where geom is the geometry field in a Havasu table, Q is a constant geometry as the query window, WherobotsDB converts the spatial query predicate to an inclusive projection ST_Intersects(MBR[geom], Q)MBR[geom] is the minimum bounding box of all values of geom in a data file. Then Sedona evaluates the projection using field statistics maintained in manifest files

data skipping

Spatial filter push down works best when the spatial data near to each other were stored in the same file. Havasu provides a syntax CREATE SPATIAL INDEX for rewriting the table to sort the records by geometry column.

CREATE SPATIAL INDEX FOR <table_name> USING hilbert(<geometry_column>, <precision>) [ WHERE <condition> ] [ OPTIONS <options> ];

This statement will rewrite the data files of the table and cluster the data by the geometry column. This feature is very useful when the table contains a large amount of data and the spatial filter is very selective. For example, if the table contains 1TB of data and the spatial filter will only select 1% of the data, ideally Havasu will only read ~ 10GB of data to answer the query.

Navigating Spatial Data with Havasu-Powered WherobotsDB Tables

WherobotsDB implements a reader/writer for the Havasu spatial table format. Users can perform many interesting spatial database operations on Havasu tables using WherobotsDB in Wherobots Cloud. Here we explore some common operations using WherobotsDB. For details, please read Wherobots documentation. To follow along create a free Wherobots Cloud account.

Create a new table in Wherobots table catalog

First, let’s create a Havasu table in the wherobots table catalog. This catalog by default is configured to use your Wherobots Cloud S3 storage, but another storage location can also be specified. We’ll use a dataset of Taxi rides as our example.

CREATE TABLE wherobots.test_db.taxi (
  pickup GEOMETRY,
  Trip_Pickup_DateTime STRING,
  Payment_Type STRING,
  Fare_Amt DECIMAL(10,0))
USING havasu.iceberg
-- By default this table will be stored in your Wherobots Cloud S3 account
-- Optionally specify other location
-- LOCATION 's3://path/to/warehouse/test_db/taxi'

List tables in Wherobots table catalog

We can view all tables within the wherobots table catalog using SHOW TABLES:

SHOW TABLES IN wherobots.test_db

We can see the wherobots.test_db.taxi table that we just created:

+---------+-----------+-----------+
|namespace|  tableName|isTemporary|
+---------+-----------+-----------+
|  test_db|       taxi|      false|
+---------+-----------+-----------+

Describe a table in Wherobots table catalog

To view the columns and datatypes of each column we can describe the table:

DESCRIBE TABLE wherobots.test_db.taxi

Note here that our pickup column is of type geometry:

+--------------------+---------+-------+
|            col_name|data_type|comment|
+--------------------+---------+-------+
|              pickup| geometry|   null|
|Trip_Pickup_DateTime|   string|   null|
|        Payment_Type|   string|   null|
|            Fare_Amt|   string|   null|
+--------------------+---------+-------+

Insert geometry data

We can insert data into our table using SQL. Here we specify the geometry value using the ST_GeomFromText function which takes a WKT string, in this case to describe the point geometry that represents the pickup location.

sedona.sql("""
INSERT INTO wherobots.test_db.taxi
VALUES(ST_GeomFromText('POINT (-73.96969 40.749244)'), '10/16/09 22:35', 'Credit', 42)
""")

We can also write spatial DataFrames to Havasu tables. Here we load a NYC taxi dataset into a Sedona DataFrame, then append the data to our wherobots.test_db.taxi Havasu table:

taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://wherobots-examples-prod/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'CAST(Fare_Amt AS DECIMAL)')
taxidf = taxidf.filter(col("pickup").isNotNull())

taxidf.writeTo("wherobots.test_db.taxi").append()

Create spatial index

Creating a spatial index will rewrite the table to sort records by the geometry column. Havasu supports the hilbert index strategy which will sort the data based on the Hilbert space filling curve which is very efficient at sorting geospatial data based on proximity. We can configure the precision by specifying a value for the precision parameter, which is the number of bits used to represent the Hilbert index.

sedona.sql("CREATE SPATIAL INDEX FOR wherobots.test_db.taxi USING hilbert(pickup, 10)")

Read data from Havasu table

We can query our Havasu tables using familiar SQL, however when using WherobotsDB we have the advantage of spatial queries using Spatial SQL functions. Here we search for all taxi pickups that occurred within a certain area around a given point:

sedona.sql("""
SELECT * FROM wherobots.test_db.taxi 
WHERE ST_Intersects(pickup, ST_Buffer(ST_GeomFromText('POINT (-73.96969 40.749244)'), 0.001))
""").show(truncate=False)
+----------------------------+--------------------+------------+--------+
|pickup                      |Trip_Pickup_DateTime|Payment_Type|Fare_Amt|
+----------------------------+--------------------+------------+--------+
|POINT (-73.96969 40.749244) |1/5/09 16:29        |Credit      |9       |
|POINT (-73.969387 40.749159)|1/20/09 14:38       |CASH        |7       |
|POINT (-73.969308 40.75001) |1/8/09 17:48        |CASH        |11      |
|POINT (-73.969355 40.749315)|1/7/09 16:52        |CASH        |10      |
|POINT (-73.970238 40.749497)|1/19/09 2:42        |Credit      |45      |
|POINT (-73.969492 40.749103)|1/21/09 19:34       |Credit      |15      |
|POINT (-73.970158 40.749055)|1/15/09 14:34       |CASH        |8       |
|POINT (-73.969638 40.748663)|1/27/09 17:46       |CASH        |9       |
|POINT (-73.970167 40.749033)|1/2/09 18:49        |CASH        |8       |
|POINT (-73.97059 40.749077) |1/18/09 20:39       |Credit      |10      |
|POINT (-73.970105 40.748985)|1/12/09 11:39       |CASH        |9       |
|POINT (-73.970228 40.749027)|1/8/09 16:07        |CASH        |5       |
|POINT (-73.9697 40.748737)  |1/5/09 18:04        |Credit      |6       |
|POINT (-73.970628 40.749132)|1/27/09 18:11       |CASH        |10      |
|POINT (-73.969573 40.748677)|1/29/09 19:35       |CASH        |5       |
|POINT (-73.969783 40.749163)|1/6/09 19:48        |Cash        |8       |
|POINT (-73.969522 40.748948)|1/4/09 16:24        |CASH        |5       |
|POINT (-73.969529 40.749625)|1/7/09 23:38        |CASH        |7       |
|POINT (-73.969473 40.749072)|1/29/09 18:04       |CASH        |16      |
|POINT (-73.970575 40.749395)|1/7/09 19:36        |CASH        |8       |
+----------------------------+--------------------+------------+--------+
only showing top 20 rows

Insert out-db raster data

We can also work with raster data in Havasu tables. Here we insert raster data into a Havasu table using the out-db option. You can read more about working with raster data in Havasu tables in the documentation.

sedona.sql("SELECT RS_FromPath('s3a://XXX.tif') as rast"). \\
    writeTo("wherobots.test_db.test_table").append()

Havasu-Powered Wherobots open data catalog

Wherobots collects open datasets from various data sources, then cleans and transforms them to Havasu format to enable linking enterprise data to the real physical world. All datasets are provided for free (except AWS data transfer fee). Certain datasets are only accessible by our Pro Edition users. To learn more, please read Wherobots Open Data

Dataset name Availability in Wherobots Type Count Description
Overture Maps buildings/building Community edition Polygon 785 million Any human-made structures with roofs or interior spaces
Overture Maps places/place Community edition Point 59 million Any business or point of interest within the world
Overture Maps admins/administrativeBoundary Community edition LineString 96 thousand Any officially defined border between two Administrative Localities
Overture Maps admins/locality Community edition Point 2948 Countries and hierarchical subdivisions of countries
Overture Maps transportation/connector Community edition Point 330 million Points of physical connection between two or more segments
Overture Maps transportation/segment Community edition LineString 294 million Center-line of a path which may be traveled
Google & Microsoft open buildings Professional edition Polygon 2.5 billion Google & Microsoft Open Buildings, combined by VIDA
LandSAT surface temperature Professional edition Raster (GeoTiff) 166K images, 10 TB size The temperature of the Earth’s surface in Kelvin, from Aug 2023 to Oct 2023
US Census ZCTA codes Professional edition Polygon 33144 ZIP Code Tabulation Areas defined in 2018
NYC TLC taxi trip records Professional edition Point 200 million NYC TLC taxi trip pickup and dropoff records per trip
Open Street Maps all nodes Professional edition Point 8 billion All the nodes of the OpenStreetMap Planet dataset
Open Street Maps postal codes Professional edition Polygon 154 thousand Boundaries of postal code areas as defined in OpenStreetMap
Weather events Professional edition Point 8.6 million Events such as rain, snow, storm, from 2016 – 2022
Wild fires Professional edition Point 1.8 million Wildfire that occurred in the United States from 1992 to 2015

The Wherobots open data catalog can be extremely useful when tables are combined, typically using spatial joins, to address real world business use cases like risk analysis, site selection, fleet vehicle optimization and answering other business intelligence questions.

Spatial join query to find zones prone to wild fires
Let’s see how we can make use of the Wherobots open data catalog using Havasu tables to perform a spatial join operation to find US zipcode regions that experience the most wild fires. To do this we will use the wherobots_open_data.us_census.zipcode Havasu table which contains the polygon geometries of US zipcodes and wherobots_open_data.weather.wild_fires which contains point geometries of wild fire events.

We perform a spatial join operation using the ST_Intersects spatial SQL function to define a predicate that will join fires that occur within their respective zipcodes.

fire_zone = sedona.sql(
    """
    SELECT
        z.geometry as zipgeom,
        z.ZCTA5CE10 as zipcode,
        f.FIRE_NAME
    FROM
        wherobots_open_data.us_census.zipcode z,
        wherobots_open_data.weather.wild_fires f
    WHERE
        ST_Intersects(z.geometry, f.geometry)
    """
)

We can then group this data by Zipcode to find the count of fires that occur in each zip code and visualize the results. This type of analysis can be useful for risk analysis and insurance premium pricing.

wildfire risk by county

Resources

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter: