Analyzing The Overture Maps Places Dataset Using SedonaDB, Wherobots Cloud, & GeoParquet

Introduction

Overture Maps, supported by the Overture Maps Foundation (OMF), offers a comprehensive geospatial data set, now in GeoParquet format, categorized into themes like places of interest, buildings, transportation networks, and administrative boundaries. GeoParquet, a geospatially optimized variant of the standard Parquet format, enhances the management of spatial data, making it particularly well-suited for geospatial analytics. Unlike traditional Parquet, GeoParquet is specifically designed to efficiently store and handle spatial information, which includes the addition of spatial indexing and optimized storage of geometry data.

This article aims to showcase the practical applications and benefits of Overture Maps data available in the Wherobots Open Data Catalog. By delving into real-world use cases, we demonstrate how the Overture Maps dataset enables deeper and faster insights into urban dynamics and broadens the scope for advanced geospatial analysis.

To follow along, first create a free account in Wherobots Cloud.

Data Schema for Places Theme in Overture Maps

The Places theme in Overture Maps represents point locations of various facilities, services, or amenities. Key schema design choices include:

  • Extensible Attributes: Basic common attributes such as phone, mail, website, and brand are included. Additional attributes not currently in the official release are allowed with an "ext" prefix. Attributes specific to certain types of places are planned for future inclusion.
  • Controlled Categories: A hierarchical categorization system (taxonomy) allows for the transformation of various categorization systems to the Overture framework. This taxonomy is intended to be comprehensive and will be fine-tuned over time.
Schema Representation
root
|-- id: string (nullable = true)
|-- updatetime: string (nullable = true)
|-- version: integer (nullable = true)
|-- names: map (nullable = true)
|    |-- key: string
|    |-- value: array (valueContainsNull = true)
|    |    |-- element: map (containsNull = true)
|    |    |    |-- key: string
|    |    |    |-- value: string (valueContainsNull = true)
|-- categories: struct (nullable = true)
|    |-- main: string (nullable = true)
|    |-- alternate: array (nullable = true)
|    |    |-- element: string (containsNull = true)
|-- confidence: double (nullable = true)
|-- websites: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- socials: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- emails: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- phones: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- brand: struct (nullable = true)
|    |-- names: map (nullable = true)
|    |    |-- key: string
|    |    |-- value: array (valueContainsNull = true)
|    |    |    |-- element: map (containsNull = true)
|    |    |    |    |-- key: string
|    |    |    |    |-- value: string (valueContainsNull = true)
|    |-- wikidata: string (nullable = true)
|-- addresses: array (nullable = true)
|    |-- element: map (containsNull = true)
|    |    |-- key: string
|    |    |-- value: string (valueContainsNull = true)
|-- sources: array (nullable = true)
|    |-- element: map (containsNull = true)
|    |    |-- key: string
|    |    |-- value: string (valueContainsNull = true)
|-- bbox: struct (nullable = true)
|    |-- minx: double (nullable = true)
|    |-- maxx: double (nullable = true)
|    |-- miny: double (nullable = true)
|    |-- maxy: double (nullable = true)
|-- geometry: geometry (nullable = true)
|-- geohash: string (nullable = true)

Accessing Overture Maps Places Dataset

To analyze the data from Overture Maps, we first create and connect our SedonaContext to the Wherobots Open Data Catalog like so,

from sedona.spark import *

config = SedonaContext.builder(). \
config("spark.sql.catalog.wherobots_examples.type", "hadoop"). \
config("spark.sql.catalog.wherobots_examples", "org.apache.iceberg.spark.SparkCatalog"). \
config("spark.sql.catalog.wherobots_examples.warehouse", "s3://wherobots-examples-prod/havasu/warehouse"). \
config("spark.sql.catalog.wherobots_examples.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"). \
getOrCreate()

sedona = SedonaContext.create(config)

Next, we access the Places theme dataset of Overture Maps via,

places_df = sedona.table("wherobots_examples.overture.places_place")

Spatial Filtering for NYC Metropolitan Area

For all use cases in this article, we focus on the New York City (NYC) metropolitan area. We apply spatial filtering to limit our dataset to this specific area, using the bounding box coordinates of New York City.

spatial_filter = "ST_Within(geometry, ST_PolygonFromEnvelope(-74.25909, 40.477399, -73.700181, 40.917577))"
places_df = places_df.where(spatial_filter)

To illustrate the comprehensive coverage of the dataset, the following map showcases 278,998 points of interest just in the New York City area.

Visualizing places from the Overture Maps dataset

Elevate key nested fields to top level columns

To facilitate easier aggregation and analysis, it’s important to transform certain nested fields into top-level columns. In our dataset, we focus on the ‘main’ and ‘alternate’ subcategories within the ‘categories’ column of the places dataset.

First, we create a new column ‘category’ that directly holds the values from ‘categories.main’:

places_df = places_df.withColumn("category", col("categories.main")) 

Next, we use the explode function to transform the ‘alternate’ subcategories. The explode function is used to expand an array or map column into multiple rows. When applied to the ‘categories.alternate’ array, each element in the array is turned into a separate row, effectively creating a new row for each alternate category associated with the same place.

places_df_exploded = places_df.withColumn("alternate_category", explode("categories.alternate")) 

Here’s what the explode transformation looks like:

Before applying transformation:

+--------------------+--------------------------------------------+
|id                  |categories                                  |
+--------------------+--------------------------------------------+
|tmp_F36B3571B3E58...|{hvac_services, [industrial_equipment]}     |
|tmp_240555DC4354D...|{elementary_school, [school, public_school]}|
+--------------------+--------------------------------------------+

After applying transformation:

+------------------------------------+-----------------+--------------------+
|id                                  |category         |alternate_category  |
+------------------------------------+-----------------+--------------------+
|tmp_F36B3571B3E583C482BD02CAC65657B6|hvac_services    |industrial_equipment|
|tmp_240555DC4354D0975F72960E276D481C|elementary_school|school              |
|tmp_240555DC4354D0975F72960E276D481C|elementary_school|public_school       |
+------------------------------------+-----------------+--------------------+

Explore categories

Group the data by ‘category’ and count the occurrences to understand the distribution of categories. After GroupBy, the categories are ranked based on number of occurrences. This tells us about the most common business categories in NYC.

categories_df = places_df.groupBy("category").agg(count("*").alias("count"))
categories_df = categories_df.orderBy("count", ascending=False)
windowSpec = Window.orderBy(col("count").desc())
categories_df = categories_df.withColumn("overall_rank", rank().over(windowSpec))
categories_df.show(10, truncate=False)

This gives us the following output:

+--------------------------------+-----+------------+
|category                        |count|overall_rank|
+--------------------------------+-----+------------+
|beauty_salon                    |10919|1           |
|community_services_non_profits  |5963 |2           |
|church_cathedral                |5507 |3           |
|professional_services           |4675 |4           |
|landmark_and_historical_building|4436 |5           |
|hospital                        |4035 |6           |
|dentist                         |3538 |7           |
|real_estate                     |3330 |8           |
|park                            |3171 |9           |
|school                          |3016 |10          |
+--------------------------------+-----+------------+

Explore Coffee Shops

Lets explore the coffee shops category a bit more.

coffee_df = places_df.filter(places_df_exploded.category == "coffee_shop")
coffee_alt_cats = places_df_exploded.filter(places_df_exploded.category == "coffee_shop").groupBy("alternate_category").agg(count("*").alias("count"))
coffee_alt_cats = coffee_alt_cats.orderBy("count", ascending = False)
coffee_alt_cats.show(11, truncate=False)

We group the coffee shop data by ‘alternate_category’ and count the occurrences to understand the distribution of coffee shop types.
After grouping by ‘alternate_category’, the data is aggregated to count the occurrences and then ordered to show the most common alternate categories within coffee shops.

The Bar chart below shows the relative frequency of each alternate category as a percentage of total coffee shops.

Visualizing alternate categories

Now, lets filter the coffee_df with ‘bagel_shop’ as alternate_category because you may want to grab coffee and bagels on your way to work without having to stand in line at both a coffee shop and a bagel shop.

coffee_bagel_df = coffee_df.filter(array_contains(coffee_df.categories.alternate,"bagel_shop"))
coffee_bagel_df = coffee_bagel_df.select(coffee_bagel_df.id, coffee_bagel_df.names, coffee_bagel_df.geometry)
coffee_bagel_df = coffee_bagel_df.withColumn("name", col("names.common")[0]["value"]).drop("names")

Visualizing coffee shops that make bagels in NYC using SedonaKepler:

Visualizing points of interest from Overture Maps

Exploring Stadiums Arena category

Let’s imagine we want to analyze places where we might see a show or sports event, such as stadiums and arenas, and understand the types of businesses located within walking distance. This analysis can provide insights into the commercial ecosystem surrounding entertainment venues and help us understand the urban dynamics in these areas.

We begin by filtering out the category and then creating temporary views for places and arenas. In PySpark, in order to execute SQL commands on a DataFrame, you need to register it as a temporary view or table first.

arena_df = places_df.filter(places_df.category == "stadium_arena")
arena_df.createOrReplaceTempView("Arenas")
places_df.createOrReplaceTempView("Places")

To Identify proximal businesses to Stadium Arenas, we perform a spatial intersection.

The following SQL query performs a spatial intersection to find businesses within a 0.002 unit distance (about 1 block) from Stadium Arenas . It uses ST_Intersects for spatial relation checks, combined with ST_Buffer to expand the Arena geometries by 0.02 units, creating a search area. The value 0.02 units is assumed to be the walkable distance of 1 block.

arena_places = sedona.sql('''
    SELECT
        Places.id AS places_id,
        Places.geometry AS places_geometry,
        Places.category AS places_category,
        Arenas.id AS arena_id,
        Arenas.geometry AS arena_geometry,
        Arenas.names.common[0].value AS arena_name
    FROM
        Places, Arenas
    WHERE
        ST_Intersects(Places.geometry, ST_Buffer(Arenas.geometry, 0.002))
    ''')

To get a better picture, here’s a rendered map of the arena_places DataFrame using SedonaKepler. The red dots are the Stadium Arenas while the blue dots are the businesses in the arena’s vicinity.

Visualizing Overture Maps points of interest

Next, we group the proximal businesses by ‘category’ and count the occurrences to understand the distribution of proximal businesses. After the GroupBy operation, the categories are ranked based on number of occurrences. This tells us about the most common business categories in proximity to Stadium Arenas.

arena_places_count = arena_places.groupBy("places_category").agg(countDistinct("places_id").alias("count"))
arena_places_count = arena_places_count.orderBy("count", ascending=False)
windowSpec = Window.orderBy(col("count").desc())
arena_places_count = arena_places_count.withColumn("arena_rank", rank().over(windowSpec))
arena_places_count.show(15, truncate=False)
arena_places_count.count()

To highlight which types of businesses are more commonly found in the vicinity of Stadium Arenas, we compare the frequency of various business categories overall versus those near Stadium Arenas. This is achieved by,

  • Joining the categories_count and arena_places_count DataFrames
  • Calculating the rank differences
  • Ordering the result by rank difference
sedona.sql('''
    SELECT
        cc.category,
        cc.count AS overall_count,
        apc.count AS arena_count,
        cc.overall_rank,
        apc.arena_rank,
        cc.overall_rank - apc.arena_rank AS rank_difference
    FROM categories_count AS cc
    LEFT JOIN arena_places_count AS apc
        ON cc.category = apc.places_category
    WHERE apc.arena_rank <= 50
    ORDER BY rank_difference desc nulls last
''').show(12, truncate=False)

We get the table below,

+---------------------------------+-------------+-----------+------------+----------+---------------+
|category                         |overall_count|arena_count|overall_rank|arena_rank|rank_difference|
+---------------------------------+-------------+-----------+------------+----------+---------------+
|advertising_agency               |628          |73         |95          |47        |48             |
|broadcasting_media_production    |873          |102        |66          |27        |39             |
|theatre                          |1046         |129        |56          |18        |38             |
|travel_services                  |710          |76         |80          |42        |38             |
|college_university               |1503         |188        |38          |10        |28             |
|jewelry_store                    |1661         |227        |33          |7         |26             |
|arts_and_entertainment           |1154         |107        |48          |23        |25             |
|counseling_and_mental_health     |1023         |82         |61          |39        |22             |
|topic_concert_venue              |1056         |89         |54          |33        |21             |
|hotel                            |1626         |151        |35          |16        |19             |
|event_planning                   |1082         |84         |52          |37        |15             |
|financial_service                |1892         |168        |26          |12        |14             |
Insights
  • High Concentration of Relevant Services: Categories like ‘advertising_agency’, ‘broadcasting_media_production’, and ‘theatre’ are much more common near stadium arenas than their overall city rankings, suggesting a synergy with sports and entertainment venues.
  • University-Affiliated Stadiums: The proximity of ‘college_university’ to stadium arenas might suggest that some of these stadiums are located within or near university campuses, serving as venues for college sports events, which are often significant in the United States.
  • Accommodation for Visitors: The high ranking of ‘hotel’ near stadium arenas indicates a demand for accommodation by visitors who may be traveling to NYC for games or events. This is consistent with the transient nature of sports and entertainment events, which often draw fans and participants from outside the local area.
  • Luxury and Leisure: ‘jewelry_store’ has a rank difference of 26, showcasing a demand for luxury shopping experiences around stadium arenas, potentially linked to the high-profile nature of events held at these locations.
  • Entertainment and Event Planning: Categories like ‘arts_and_entertainment’, ‘event_planning’, and ‘topic_concert_venue’ have higher rankings near arenas, reflecting the role of these venues as hubs for events and cultural activities.

Conclusion

The Overture Maps data in Wherobots open data catalog, offers great efficiency in spatial analytics. This synergy between advanced data formats and powerful analytics tools opens up new possibilities for geospatial analysis and insights. The analyses presented here are just the beginning and make several assumptions. However, with Wherobots and Overture Maps data, the possibilities for uncovering new insights and informing data-driven decisions are virtually limitless.

You can follow along with the code from this blog post by creating a free account in Wherobots Cloud.

Want to keep up with the latest developer news from the Wherobots and Apache Sedona community? Sign up for the This Month In Wherobots Newsletter:


The Biggest Apache Sedona Release Ever, Wherobots Cloud Launch, Building Maps With Felt & Self Service Geospatial Analytics – This Month In Wherobots

Welcome to This Month In Wherobots where we highlight the latest goings on from the Wherobots & Apache Sedona community. In this edition we’re taking a look at the largest release ever in the history of Apache Sedona, the latest SedonaDB and Wherobots Cloud launch, self-service geospatial analytics, building maps with Felt, and a look at the Wherobots Online Community.

Featured Community Members: Nara Khou & Cort Lunke

Nara Khou and Cort Lunke

Each month we highlight special members of the community who contribute their expertise and passion to the Wherobots and Apache Sedona community. This month’s featured community members are Nara Khou Lead Data Engineer at Land O’Lakes and Cort Lunke Data & Analytics Lead at Land O’Lakes. Earlier this year Nara and Cort presented "Self-Service Geospatial Analytics Built On Databricks, Apache Sedona and R" at Databricks’ Data+AI Summit. Thanks so much Nara and Cort for sharing your success story using Apache Sedona with the community!

Self-Service Geospatial Analytics Built On Databricks, Apache Sedona And R

data pipeline process

In this presentation Nara and Cort discuss some of the challenges of working with spatial data and how Apache Sedona can be used to address some of these difficulties in an enterprise data environment like Databricks. Cort and Nara share why they chose Apache Sedona for working with spatial data at scale to analyze watershed and cropland data. They discuss the data processing pipeline used for the project and demo the end to end data pipeline from data collection, processing and analysis using Apache Sedona, and visualization using R-Studio – all within Databricks.

Watch the recording of Nara & Cort’s Data+AI Summit Presentation

SedonaDB & Wherobots Cloud Launch

The Wherobots team was excited to reveal SedonaDB, the cloud-native spatial analytics database platform at the FOSS4G North America conference. SedonaDB builds upon the scability and stability of the Apache Sedona project bringing large-scale geospatial analytics capabilities to enterprises looking for a cloud-native solution. SedonaDB also introduces the Havasu open table format which enables efficient querying and updating of geometry and raster columns in Parquet files stored in cloud object stores such as AWS S3.

Get started with SedonaDB on Wherobots Cloud Free tier today.

Apache Sedona 1.5 Release

The most recent release of Apache Sedona v1.5.0 was the biggest release in the history of Apache Sedona. This release includes native support for Uber H3 hexagon functions, comprehensive raster ETL and analytics support, more ST functions to enable new geospatial workloads, XYZM support, and visualization with SedonaKepler and SedonaPyDeck. This version is also available in the official Apache Sedona Docker Image. You can find more about this release in the Apache Sedona GitHub repository and read more about the v1.5.0 Apache Sedona release here.

Learn more about the latest Apache Sedona release

Building Maps With Felt

An important requirement for data infrastructure tools like SedonaDB and Wherobots Cloud are that they integrate well with the technology ecosystems around them. In the world of spatial databases this includes geospatial visualization tooling like the web-based mapping tool Felt. This blog post shows how to integrate the Felt API with Wherobots Cloud so we can leverage SedonaDB’s geospatial analysis capabilities using Spatial SQL then publish the results of our analysis to Felt’s beautiful web-based mapping tooling. This example uses data from BirdBuddy, which publishes data about bird sightings at its smart birdfeeders to find the range of some of our favorite bird species.

Read the blog post "Creating Collaborative Web Maps With The Felt API And Wherobots Cloud".

The Wherobots Online Community Launch

Wherobots online community

The Wherobots Online Community is the forum for community members to come together, ask questions, and share their expertise and excitement about spatial analytics. This site was launched earlier this month and we’re excited to have a home for the community. Please feel free to join the community, introduce yourself, and share what you’re working on and why you’re excited about spatial analytics! We’ve also launched the Wherobots YouTube Channel as a way to share educational content about spatial analytics – please check it out and subscribe.

Join The Wherobots Online Community

Upcoming Events

  • GeoParquet Community Day (San Francisco – January 30th, 2024) – Join us for GeoParquet Community Day to highlight the usage of spatial data in Parquet, open table formats, and cloud-native spatial analytics.
  • Analyzing Real Estate Data Using SedonaDB & Wherobots Cloud (Online Livestream – December 12th, 2023) – In this livestream we’ll take a look at analyzing US real estate data at the county level, using data from Zillow and Natural Earth. We’ll introduce the Wherobots Cloud platform and SedonaDB using Python and Spatial SQL. Be sure to subscribe to the Wherobots YouTube channel to keep up to date with more Wherobots livestreams and videos!

Want to receive this monthly update in your inbox? Sign up for the This Month In Wherobots Newsletter:


Creating Collaborative Web Maps With The Felt API And Wherobots Cloud

An important requirement for data infrastructure tools like SedonaDB and Wherobots Cloud are that they integrate well with the technology ecosystems around them. In the world of spatial databases this includes geospatial visualization tooling. Being able to create maps with data from SedonaDB is an important usecase for Wherobots Cloud so in this blog post I wanted to explore how to create collaborative web maps with Felt, providing data using the Felt API.

For this map I wanted to integrate the Felt API with Wherobots Cloud so I could do some geospatial analysis using Spatial SQL and SedonaDB then publish the results of my analysis to Felt’s beautiful web-based mapping tooling.

I decided to use data from BirdBuddy, which publishes data about bird sightings at its smart birdfeeders to find the range of some of my favorite bird species.

North American bird ranges

You can follow along by creating a free account on Wherobots Cloud.

Collaborative Web Maps With Felt

Felt is a web-based tool for creating collaborative maps. Felt is bringing collaborative real-time editing functionality similar to what you’ve seen in Google Docs or Notion to the world of mapmaking. You can annotate, comment, and draw on the map, then share the results with anyone on the web you want with a single link to start collaborating.

Felt has also invested in supporting a wide range of data formats for adding data to maps with their Upload Anything tool, a simple drag and drop interface that supports formats include Shapefile, GeoJSON, GeoTiff, CSV, Jpeg, etc. Felt also built a QGIS plugin so if you’re used to working with desktop GIS tooling you can easily export your project and layers to Felt’s web-based tooling via QGIS.

Felt enables programmatically creating maps and layers as well as adding data to the map via the Felt API. We’ll be using the Felt API to upload the results of our analysis using SedonaDB to create and publish a map.

Wherobots Cloud File Management

Our data comes from Bird Buddy which makes a smart bird feeder than can identify bird species and (optionally) report their location.

BirdBuddy screenshot

Bird Buddy publishes its data as CSV files so we’ll download the latest data and then upload the file to our Wherobots Cloud instance via the "Files" tab. The free tier of Wherobots Cloud includes free data storage in AWS S3 which we can access within the Wherobots notebook environment using the S3 URL of the file.

Wherobots Cloud file management

Once you’ve uploaded a file you can click the copy file icon to copy the file’s S3 path to access the file in the Wherobots notebook environment. Note that these files are private to your Wherobots organization, so the S3 URL below won’t be accessible to anyone outside my organization.

S3_URL = "s3://<YOUR_S3_URL_HERE>/birdbuddy/"

Now we’ll load the BirdBuddy CSV data and convert it to a SedonaDB DataFrame so we can use Spatial SQL to find the range of each species.

bb_df = sedona.read.format('csv').option('header','true').option('delimiter', ',').load(S3_URL)
bb_df.show(5)

Looking at the first few rows of the DataFrame we can see we have latitude and longitude stored as seperate fields, a well as information about the bird species.

+-------------------+--------------------+----------+-----------------+----------------+
|anonymized_latitude|anonymized_longitude| timestamp|      common_name| scientific_name|
+-------------------+--------------------+----------+-----------------+----------------+
|          45.441235|          -122.51253|2023-09...|  dark eyed junco|      junco h...|
|           41.75291|            -83.6242|2023-09...|northern cardinal|cardinalis ca...|
|            43.8762|            -78.9261|2023-09...|northern cardinal|cardinalis ca...|
|            33.7657|            -84.2951|2023-09...|northern cardinal|cardinalis ca...|
|            30.4805|            -84.2243|2023-09...|northern cardinal|cardinalis ca...|
+-------------------+--------------------+----------+-----------------+----------------+
only showing top 5 rows

Spatial SQL With SedonaDB

Now we’re ready to use the power of Spatial SQL to analyze our Bird Buddy data. We want to find the range of each species, but first let’s explore the data.

First we’ll convert our latitude and longitude fields into Point geometries using the ST_Point SQL function.

bb_df = bb_df.selectExpr('ST_Point(CAST(anonymized_longitude AS Decimal(24,20)), CAST(anonymized_latitude AS Decimal(24,20))) AS location', 'timestamp', 'common_name', 'scientific_name')
bb_df.createOrReplaceTempView('bb')
bb_df.show(5)

Now the location field is a proper geometry type that our Spatial DataFrame can take advantage of.

+--------------------+--------------------+-----------------+--------------------+
|            location|           timestamp|      common_name|     scientific_name|
+--------------------+--------------------+-----------------+--------------------+
|POINT (-122.51253...|2023-09-01 00:00:...|  dark eyed junco|      junco hyemalis|
|POINT (-83.6242 4...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
|POINT (-78.9261 4...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
|POINT (-84.2951 3...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
|POINT (-84.2243 3...|2023-09-01 00:00:...|northern cardinal|cardinalis cardin...|
+--------------------+--------------------+-----------------+--------------------+
only showing top 5 rows

We have just under 14 million bird observations in our DataFrame.

bb_df.count()
------------
13972003

If we wanted to find all observations of Juncos in the data we can write a SQL query to filter the results and visualize the observations on a map using SedonaKepler, the SedonaDB integration for Kepler.gl

junco_df = sedona.sql("SELECT * FROM bb WHERE common_name LIKE '%junco' ")
junco_df.show(5)

We used the SQL LIKE string comparision operator to find all observations relating to Juncos, then stored the results in a new DataFrame junco_df.

+--------------------+--------------------+---------------+---------------+
|            location|           timestamp|    common_name|scientific_name|
+--------------------+--------------------+---------------+---------------+
|POINT (-122.51253...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-94.5916 3...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-85.643 31...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-87.7645 3...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
|POINT (-122.16346...|2023-09-01 00:00:...|dark eyed junco| junco hyemalis|
+--------------------+--------------------+---------------+---------------+
only showing top 5 rows

Now we’ll visualize the contents of our new junco_df DataFrame using SedonaKepler.

SedonaKepler.create_map(df=junco_df, name='Juncos')

Juncos observation map

Based on the map above it looks like Juncos have quite a large range throughout North America.

Next, we’ll filter the overall dataset to a few of my favorite bird species, then use the power of Spatial SQL with a GROUP BY operation to create convex hulls (polygon geometries) from the individual observations (point geometries) of each species.

By creating a convex hull around all point observations grouped by species we will create a new geometry that represents the observed range of each species in our dataset.

range_df = sedona.sql("""
    SELECT common_name, COUNT(*) AS num, ST_ConvexHull(ST_Union_aggr(location)) AS geometry 
    FROM bb 
    WHERE common_name IN ('california towhee', 'steller’s jay', 'mountain chickadee', 'eastern bluebird') 
    GROUP BY common_name 
    ORDER BY num DESC
""")
range_df.show()

Note our use of the following Spatial SQL functions:

  • ST_ConvexHull – given multiple point geometries, return a polygon geometry of an area that contains all points in a convex hull
  • ST_Union_aggr – an aggregating function that will collect multiple geometries, in this case used alongside a GROUP BY
+------------------+-----+--------------------+
|       common_name|  num|            geometry|
+------------------+-----+--------------------+
|  eastern bluebird|65971|POLYGON ((-80.345...|
|     steller’s jay|37864|POLYGON ((-110.26...|
| california towhee|22007|POLYGON ((-117.05...|
|mountain chickadee| 4102|POLYGON ((-110.99...|
+------------------+-----+--------------------+

Now we have a new DataFrame range_df with 4 rows, one for each of the species we indicated in the query above. But now the geometry field is a polygon that represents the observed range of that species in our dataset. Pretty neat – let’s visualize these species ranges using Felt.

The Felt API supports file uploads in a variety of formats, but we’ll use GeoJSON. We’ll convert our SedonaDB DataFrame into a GeoPandas GeoDataFrame and then export to a GeoJSON file so we can upload it to the Felt API.

range_gdf = geopandas.GeoDataFrame(range_df.toPandas(), geometry="geometry")
range_gdf.to_file('birdbuddy_range.geojson', driver='GeoJSON')

We’ve now created a GeoJSON file birdbuddy_range.geojson that looks a bit like this (we’ve omitted some lines):

{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {
                "common_name": "eastern bluebird",
                "num": 65971
            },
            "geometry": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [
                            -80.3452,
                            25.6062
                        ],
                        [
                            -98.2271,
                            26.2516
                        ],
                        ...
                        [
                            -80.3452,
                            25.6062
                        ]
                    ]
                ]
            }
        },
        ...
    ]
}

Felt Maps API

If you haven’t already, create a free Felt account and then in your account settings generate a new access token so you’ll be able to create maps and upload data via the Felt API.

Creating a Felt API token

FELT_TOKEN = '<YOUR_TOKEN_HERE>'

To create a new map and upload data we’ll actually need to make a few network requests to the Felt API:

  1. /maps to create a new map. This endpoint will return the id and url of the new map.
  2. /maps/{map_id}/layers to create a new layer in our new map. Note we need to use the map_id from the previous request. This endpoint will return a presigned upload URL that will allow us to upload our GeoJSON file.
  3. /maps/{map_id}/layers/{layer_id}/finish_upload to indicate we have finished uploading our data using the presigned upload URL.

The function below will create a new map in Felt, then create a new layer and upload our GeoJSON file to this layer. See the Felt API docs for more examples of what’s possible with the Felt API.

def create_felt_map(access_token, filename, map_title, layer_name):

    # First create a new map using the /maps endpoint
    create_map_response = requests.post(
        f"https://felt.com/api/v1/maps",
        headers={
            "authorization": f"Bearer {access_token}",
            "content-type": "application/json",
        },
        json={"title": map_title},
    )
    create_map_data = create_map_response.json()
    map_id = create_map_data['data']['id']
    map_url = create_map_data['data']['attributes']['url']
    print(create_map_data)

    # Next, we'll create a new layer and get a presigned upload url so we can upload our GeoJSON file
    layer_response = requests.post(
    f"https://felt.com/api/v1/maps/{map_id}/layers",
    headers={
        "authorization": f"Bearer {access_token}",
        "content-type": "application/json",
    },
    json={"file_names": [filename], "name": layer_name},
    )

    # This endpoint will return a pre-signed URL that we use to upload the file to Felt
    presigned_upload = layer_response.json()
    url = presigned_upload["data"]["attributes"]["url"]
    presigned_attributes = presigned_upload["data"]["attributes"]["presigned_attributes"]

    # A 204 response indicates that the upload was successful
    with open(filename, "rb") as file_obj:
        output = requests.post(
            url,
            # Order is important, file should come at the end
            files={**presigned_attributes, "file": file_obj},
        )
    layer_id = presigned_upload['data']['attributes']['layer_id']
    print(output)
    print(layer_id)
    print(presigned_upload)

    # Finally, we call the /maps/:map_id/layers/:layer_id/finish_upload endpoint to complete the process
    finish_upload = requests.post(
        f"https://felt.com/api/v1/maps/{map_id}/layers/{layer_id}/finish_upload",
        headers={
            "authorization": f"Bearer {access_token}",
            "content-type": "application/json"},
            json={"filename": filename, "name": layer_name},
    )
    print(finish_upload.json())

Now to create a new Felt map we can call this function, passing our API token, the name of our GeoJSON file as well as what we’d like to call our new map and the data layer.

create_felt_map(FELT_TOKEN, "birdbuddy_range.geojson", "North American Bird Ranges", "My Favorite Birds")

We’ve now created a new map in Felt and uploaded our GeoJSON data as a new layer. We can share the URL with anyone on the web to view or collaborate on our map!

North American bird ranges

We can also embed the map in our Jupyter notebook:

from IPython.display import HTML
HTML('<iframe width="1600" height="600" frameborder="0" title="My Favorite Bird Ranges" src="https://felt.com/embed/map/North-American-Bird-Ranges-a4c5cOCaRMiL64KK5N27TA"></iframe>"')

The 30 Day Map Challenge

Every year the geospatial data and map community joins together to organize the "30 Day Map Challenge" a fun and informal challenge to create a new map and share it on social media each day for one month.

The 30 Day Map Challenge

This BirdBuddy map was my 30 Day Map Challenge map for Day 3: Polygons. You can find the full Jupyter Notebook with all code on GitHub here as well as some of my other 30 Day Map Challenge maps in this repository. If you’d like to follow along with my attempt at the rest of the 30 Day Map Challenge feel free to connect with me on Twitter or LinkedIn.

Want to keep up to date with Wherobots? Sign up for the Wherobots Developer Newsletter below: