TABLE OF CONTENTS

    Contributors

    • William Lyon

      Developer Relations At Wherobots

    In this post we take a look at combining aircraft flight data with weather data to identify potentially impacted flights using public ADS-B data and weather data from NOAA. Along the way we introduce the types of spatial queries available with WherobotsDB including spatial range queries, spatial kNN, and spatial joins using Spatial SQL, geospatial visualization options, working with various file formats including GeoParquet, GeoTiff, Shapefiles, and more.

    We’ll be running this example in Wherobots Cloud, where you can create a free account to follow along. See the Getting Started With Wherobots Cloud series for an overview of Wherobots Cloud.

    Flight Trace Data

    The ADS-B flight observation data we’ll be using comes from ADSB.lol which provides publicly accessible daily updates of crowd-sourced flight tracking data. I adapted some great examples for downloading, parsing and enriching this dataset from Mark Litwintschik’s blog post and extracted 1 month of data which resulted in just over 1.1 billion observations. I extended Mark’s approach a bit and saved this data as GeoParquet, partitioned by S2 id. This will enable more efficient spatial queries on the dataset.

    First, we’ll import dependencies and configure WherobotsDB to access an S3 bucket with our example data.

    from sedona.spark import *
    
    config = SedonaContext.builder().config("spark.hadoop.fs.s3a.bucket.wherobots-examples.aws.credentials.provider","org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider").getOrCreate()
    sedona = SedonaContext.create(config)

    We load one month of world-wide aircraft trace data which is just over 1.1 billion point observations.

    # Load flight traces via GeoParquet
    traces_df = sedona.read.format("geoparquet").load("s3a://wherobots-examples/data/examples/flights/2024_s2.parquet")
    traces_df.createOrReplaceTempView("traces")
    
    traces_df.count()
    -------------------
    1103371367

    Exploratory Data Analysis

    I followed Mark’s approach of extracting individual trace observations into their own row and we store detailed information about the trace such as altitude, squawk code, heading, etc in a struct. You can find his code in this post.

    Let’s view the schema of our Spatial DataFrame that we loaded via GeoParquet.

    traces_df.printSchema()
    root
     |-- dbFlags: long (nullable = true)
     |-- desc: string (nullable = true)
     |-- icao: string (nullable = true)
     |-- ownOp: string (nullable = true)
     |-- r: string (nullable = true)
     |-- reg_details: struct (nullable = true)
     |    |-- description: string (nullable = true)
     |    |-- iso2: string (nullable = true)
     |    |-- iso3: string (nullable = true)
     |    |-- nation: string (nullable = true)
     |-- t: string (nullable = true)
     |-- timestamp: string (nullable = true)
     |-- trace: struct (nullable = true)
     |    |-- aircraft: struct (nullable = true)
     |    |    |-- alert: long (nullable = true)
     |    |    |-- alt_geom: long (nullable = true)
     |    |    |-- baro_rate: long (nullable = true)
     |    |    |-- category: string (nullable = true)
     |    |    |-- emergency: string (nullable = true)
     |    |    |-- flight: string (nullable = true)
     |    |    |-- geom_rate: long (nullable = true)
     |    |    |-- gva: long (nullable = true)
     |    |    |-- ias: long (nullable = true)
     |    |    |-- mach: double (nullable = true)
     |    |    |-- mag_heading: double (nullable = true)
     |    |    |-- nac_p: long (nullable = true)
     |    |    |-- nac_v: long (nullable = true)
     |    |    |-- nav_altitude_fms: long (nullable = true)
     |    |    |-- nav_altitude_mcp: long (nullable = true)
     |    |    |-- nav_heading: double (nullable = true)
     |    |    |-- nav_modes: array (nullable = true)
     |    |    |    |-- element: string (containsNull = true)
     |    |    |-- nav_qnh: double (nullable = true)
     |    |    |-- nic: long (nullable = true)
     |    |    |-- nic_baro: long (nullable = true)
     |    |    |-- oat: long (nullable = true)
     |    |    |-- rc: long (nullable = true)
     |    |    |-- roll: double (nullable = true)
     |    |    |-- sda: long (nullable = true)
     |    |    |-- sil: long (nullable = true)
     |    |    |-- sil_type: string (nullable = true)
     |    |    |-- spi: long (nullable = true)
     |    |    |-- squawk: string (nullable = true)
     |    |    |-- tas: long (nullable = true)
     |    |    |-- tat: long (nullable = true)
     |    |    |-- track: double (nullable = true)
     |    |    |-- track_rate: double (nullable = true)
     |    |    |-- true_heading: double (nullable = true)
     |    |    |-- type: string (nullable = true)
     |    |    |-- version: long (nullable = true)
     |    |    |-- wd: long (nullable = true)
     |    |    |-- ws: long (nullable = true)
     |    |-- altitude: long (nullable = true)
     |    |-- flags: long (nullable = true)
     |    |-- geometric_altitude: long (nullable = true)
     |    |-- geometric_vertical_rate: long (nullable = true)
     |    |-- ground_speed: double (nullable = true)
     |    |-- h3_5: string (nullable = true)
     |    |-- indicated_airspeed: long (nullable = true)
     |    |-- lat: double (nullable = true)
     |    |-- lon: double (nullable = true)
     |    |-- roll_angle: double (nullable = true)
     |    |-- source: string (nullable = true)
     |    |-- timestamp: string (nullable = true)
     |    |-- track_degrees: double (nullable = true)
     |    |-- vertical_rate: long (nullable = true)
     |-- year: string (nullable = true)
     |-- geometry: geometry (nullable = true)
     |-- date: date (nullable = true)
     |-- s2: long (nullable = true)
    

    We can also inspect the first few rows of data.

    traces_df.show(5)
    +-------+--------------------+------+-----+------+--------------------+----+-------------------+--------------------+----+--------------------+----------+-------------------+
    |dbFlags|                desc|  icao|ownOp|     r|         reg_details|   t|          timestamp|               trace|year|            geometry|      date|                 s2|
    +-------+--------------------+------+-----+------+--------------------+----+-------------------+--------------------+----+--------------------+----------+-------------------+
    |      0|PIPER PA-28R-180/...|c81c51| null|ZK-RTE|{general, NZ, NZL...|P28R|2024-01-20 00:00:00|{{null, null, nul...|null|POINT (174.722786...|2024-01-20|7854277750134145024|
    |      0|        AIRBUS A-320|c81e2c| null|ZK-OJS|{general, NZ, NZL...|A320|2024-01-15 00:00:00|{{0, 36400, 0, A3...|null|POINT (174.549826...|2024-01-15|7854277750134145024|
    |      0|PIPER PA-28R-180/...|c81c51| null|ZK-RTE|{general, NZ, NZL...|P28R|2024-01-20 00:00:00|{{null, null, nul...|null|POINT (174.722764...|2024-01-20|7854277750134145024|
    |      0|        AIRBUS A-320|c81e2c| null|ZK-OJS|{general, NZ, NZL...|A320|2024-01-15 00:00:00|{{null, null, nul...|null|POINT (174.557251...|2024-01-15|7854277750134145024|
    |      0|PIPER PA-28R-180/...|c81c51| null|ZK-RTE|{general, NZ, NZL...|P28R|2024-01-20 00:00:00|{{0, 1600, null, ...|null|POINT (174.722589...|2024-01-20|7854277750134145024|
    +-------+--------------------+------+-----+------+--------------------+----+-------------------+--------------------+----+--------------------+----------+-------------------+
    only showing top 5 rows
    

    We can query our aircraft traces Spatial DataFrame using Spatial SQL. To start off let’s just demonstrate selecting fields and viewing tabular results. Note that the geometry column here is a spatial type. The data we loaded was GeoParquet formatted so we don’t need to parse or cast the field to a geometry type.

    sedona.sql("""
    SELECT desc, ownOp, geometry, trace.ground_speed, trace.altitude, trace.aircraft.flight
    FROM traces
    WHERE trace.altitude IS NOT NULL AND OwnOp IS NOT NULL AND desc IS NOT NULL AND trace.aircraft.flight IS NOT NULL
    LIMIT 10
    """).show(truncate=False)
    +----------------+-------------------+-----------------------------+------------+--------+--------+
    |desc            |ownOp              |geometry                     |ground_speed|altitude|flight  |
    +----------------+-------------------+-----------------------------+------------+--------+--------+
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.792839 -37.010239)|122.7       |400     |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.782449 -37.013079)|161.6       |350     |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.779046 -37.01401) |168.9       |475     |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.771463 -37.016058)|165.7       |825     |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.765248 -37.017746)|162.5       |1175    |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.757827 -37.01976) |159.4       |1550    |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.748356 -37.022296)|161.3       |1950    |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.738951 -37.024796)|163.2       |2325    |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.725607 -37.028395)|165.7       |2725    |DAL64   |
    |AIRBUS A-350-900|UMB BANK NA TRUSTEE|POINT (174.713367 -37.03184) |167.3       |3125    |DAL64   |
    +----------------+-------------------+-----------------------------+------------+--------+--------+

    Let’s see where the aircraft activity is in the ADS-B flight observation dataset. Where are we seeing the most aircraft activity? We can use H3 hexagons to aggregate the individual traces and visualize the results using the SedonaKepler visualization integration. Here we count the number of aircraft trace observations within each level 5 H3 hexagon and create a choropleth map to visualize the spatial distribution.

    h3_traces_df = sedona.sql("""
    SELECT
        COUNT(*) AS num,
        ST_H3ToGeom(ST_H3CellIDs(geometry, 5, false)) AS geometry,
        ST_H3CellIDs(geometry, 5, false) AS h3
    FROM traces
    GROUP BY h3
    """)
    
    SedonaKepler.create_map(h3_traces_df, name="Distribution of aircraft traces")

    Aggregated aircraft traces

    Since our data source relies on crowdsourced data retrieved from ground stations, the data coverage is not evenly distributed. We are notably missing observations over the oceans and other areas that lack crowd-sourced ground sensors. This isn’t an issue for this analysis because flight disruption is most likely to occur around airports.

    We can filter for aircraft belonging to a specific airline using the ownOp column and also filter for traces that were observed while the aircraft was part of a named flight. This can help give us some insight into the routes flown by this airline.

    jbu_h3_traces_df = sedona.sql("""
    SELECT
        COUNT(*) AS num,
        ST_H3ToGeom(ST_H3CellIDs(geometry, 5, false)) AS geometry,
        ST_H3CellIDs(geometry, 5, false) AS h3
    FROM traces
    WHERE ownOp = "JETBLUE AIRWAYS CORP" AND trace.aircraft.flight LIKE "JBU%"
    GROUP BY h3
    """)
    
    SedonaKepler.create_map(jbu_h3_traces_df, name="Distribution of aircraft traces")
    

    Aggregated Flight Traces

    Flight Traces – Spatial Queries And Performance

    WherobotsDB supports optimizations for various types of spatial queries including:

    • Spatial range query
    • Spatial kNN query, and
    • Spatial joins

    Let’s demonstrate these types of queries and compare the performance with this dataset.

    Spatial Range Query

    Let’s say we wanted to find all aircraft traces within the bounds of a specific area, perhaps the Gulf of Mexico region. We can do this using the ST_Within Spatial SQL predicate.

    Spatial filter polygon

    Here we apply this spatial predicate to filter our 1.1 billion observations to only those within the bounds of a polygon representing a specific area. This takes just over 4 seconds.

    gulf_filter = 'ST_Within(geometry, ST_GeomFromWKT("POLYGON ((-84.294662 29.840644, -88.952866 30.297018, -89.831772 28.767659, -94.050522 29.61167, -97.038803 27.839076, -97.917709 21.453069, -94.489975 18.479609, -86.843491 21.616579, -80.779037 24.926295, -84.294662 29.840644))"))'
    gulf_traces = traces_df.where(gulf_filter)
    
    %%time
    gulf_traces.count()
    --------------------
    7539347
    
    Wall time: 4.37 s
    Spatial kNN Query

    Another type of spatial query that might be relevant for working with this dataset is a spatial kNN query, which will efficiently find the “k” nearest geometries where the value of k is specified by the user. For example, let’s find the 5 nearest aircraft traces to the Jackson Hole Airport.

    jackson_hole_traces = sedona.sql("""
    SELECT 
        desc, 
        ownOp,
        trace.aircraft.flight, 
        ST_DistanceSphere(ST_Point(-110.7376, 43.6088), geometry) AS distance
    FROM traces
    ORDER BY distance ASC
    LIMIT 5
    """)
    
    %%time
    
    jackson_hole_traces.show()
    ----------------------------
    Wall time: 11.1 s
    +--------------------+--------------------+--------+------------------+
    |                desc|               ownOp|  flight|          distance|
    +--------------------+--------------------+--------+------------------+
    |BOMBARDIER BD-700...|         FLEXJET LLC|    null| 76.92424103100763|
    |        AIRBUS A-320| UNITED AIRLINES INC|    null|161.98830220247405|
    |CESSNA 700 CITATI...|AH CAPITAL MANAGE...|    null|180.05761405860687|
    |CESSNA 750 CITATI...|           2FLIU LLC|    null|187.01797656145476|
    |DASSAULT FALCON 2000|NEXTERA ENERGY EQ...|N58MW   |204.06240031528034|
    +--------------------+--------------------+--------+------------------+

    In this case our query takes 11 seconds against a 1.1 billion row dataset.

    Historical Impact of Severe Weather Events On Flights

    An important analysis related to flight routes is the impact of severe weather events. To enable this type of analysis Wherobots makes public a dataset of severe weather events as part of the Wherobots Open Data Catalog. The dataset has over 8 million weather events reported by NOAA in the US from 2016-2022.

    weather = sedona.table("wherobots_pro_data.weather.weather_events")
    weather.show()
    +--------+-------------+--------+-------------------+-------------------+-----------------+----------+-----------+-----------+-----------+----+--------------+-----+-------+--------------------+
    | EventId|         Type|Severity|     StartTime(UTC)|       EndTime(UTC)|Precipitation(in)|  TimeZone|AirportCode|LocationLat|LocationLng|City|        County|State|ZipCode|            geometry|
    +--------+-------------+--------+-------------------+-------------------+-----------------+----------+-----------+-----------+-----------+----+--------------+-----+-------+--------------------+
    |W-237263|         Cold|  Severe|2016-01-01 08:57:00|2016-01-04 15:32:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237264|         Rain|   Light|2016-01-04 22:49:00|2016-01-04 23:30:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237265|         Cold|  Severe|2016-01-05 00:35:00|2016-01-05 15:32:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237266|         Rain|   Light|2016-01-05 15:32:00|2016-01-05 17:57:00|              0.2|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237267|         Rain|Moderate|2016-01-05 17:57:00|2016-01-05 18:09:00|             0.14|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237268|         Rain|   Light|2016-01-05 18:09:00|2016-01-05 18:23:00|             0.07|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237269|         Rain|Moderate|2016-01-05 18:23:00|2016-01-05 18:57:00|             0.35|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237270|Precipitation|     UNK|2016-01-05 18:57:00|2016-01-05 19:41:00|             0.22|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237271|         Cold|  Severe|2016-01-06 00:57:00|2016-01-06 15:31:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237272|         Rain|   Light|2016-01-06 16:57:00|2016-01-06 18:33:00|             0.16|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237273|Precipitation|     UNK|2016-01-06 18:57:00|2016-01-06 19:10:00|             0.23|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237274|         Rain|Moderate|2016-01-06 19:10:00|2016-01-06 19:57:00|             0.16|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237275|         Rain|   Light|2016-01-06 20:57:00|2016-01-06 21:27:00|             0.03|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237276|         Rain|Moderate|2016-01-06 21:27:00|2016-01-06 22:57:00|             0.26|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237277|         Cold|  Severe|2016-01-07 00:57:00|2016-01-07 15:35:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237278|         Rain|   Light|2016-01-07 18:10:00|2016-01-07 18:57:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237279|         Rain|   Light|2016-01-07 23:57:00|2016-01-08 00:57:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237280|         Cold|  Severe|2016-01-08 00:57:00|2016-01-08 15:29:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237281|         Cold|  Severe|2016-01-09 00:57:00|2016-01-11 14:32:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    |W-237282|         Cold|  Severe|2016-01-12 00:57:00|2016-01-12 15:57:00|              0.0|US/Pacific|       KNSI|    33.2338|  -119.4559|null|Ventura County|   CA|   null|POINT (-119.4559 ...|
    +--------+-------------+--------+-------------------+-------------------+-----------------+----------+-----------+-----------+-----------+----+--------------+-----+-------+--------------------+
    only showing top 20 rows
    Severe Weather Events

    Let’s start our weather event analysis by finding areas that are commonly susceptible to severe weather events. We’ll again use H3 hexagons to aggregate the data, this time counting the number of severe weather events that occur within each hexagon. We’ll visualize the results as a choropleth to see areas where weather events frequently occur.

    weather_h3_df = sedona.sql("""
    SELECT
        COUNT(*) AS num,
        ST_H3ToGeom(ST_H3CellIDs(ST_Buffer(geometry, 0.1), 5, false)) AS geometry,
        ST_H3CellIDs(ST_Buffer(geometry, 0.1), 5, false) AS h3
    FROM wherobots_pro_data.weather.weather_events
    WHERE Severity = "Severe"
    GROUP BY h3
    """)
    
    weather_h3_df.createOrReplaceTempView("weather_h3")
    SedonaKepler.create_map(weather_h3_df.cache(), name="Severe weather events")
    

    Aggregated weather events

    Historical Severe Weather Aircraft Trace Analysis

    Now that we’ve determined which areas are susceptible to frequent severe weather events, we can analyze flight trace observations in these areas to determine flights that may be susceptible to frequent severe weather events using a spatial join. Note that we’re joining a table with 1.1 billion observations against our weather event data, which takes just under 13 seconds in Wherobots Cloud.

    severe_weather_flight_traces_df= sedona.sql("""
    SELECT traces.geometry AS geometry, desc, ownOp, trace.aircraft.flight AS flight, num AS eventCount
    FROM traces
    JOIN weather_h3
    WHERE ST_Contains(weather_h3.geometry, traces.geometry) AND trace.aircraft.flight IS NOT null
    """)
    
    %%time
    severe_weather_flight_traces_df.count()
    ----------------------------
    Wall time: 12.7 s
    
    53788395
    

    Using the flight number we can calculate an “intensity” metric that attempts to measure the potential impact of a flight to be disrupted by a severe weather event using the historical weather data. Here we calculate this intensity metric across all traces to find the most potentially impacted flights.

    most_impacted_flights = sedona.sql("""
    SELECT sum(eventCount) AS intensity, ST_Collect(collect_list(geometry)) AS geometry, flight
    FROM severe_weather_traces
    WHERE NOT flight = "        " AND NOT flight = "00000000"
    GROUP BY flight
    ORDER BY intensity DESC
    LIMIT 100
    """)

    Most impacted flights

    Flight Routes

    So far we’ve been working with individual aircraft traces (point observations), but it’s much more useful to analyze flights as routes. We can construct the individual flight routes from these aircraft traces, for a specific airline by filtering on the ownOp and trace.aircraft.flight fields.

    First we collect all aircraft traces that match our filter by airline, grouping by flight number. Then we can convert this collection of point geometries into a LineString geometry that represents the path the aircraft took during this flight using the ST_MakeLine Spatial SQL function.

    jbu_flights_df = sedona.sql("""
    WITH jbu_traces AS (
        SELECT 
            any_value(trace.aircraft.flight) AS flight, 
            collect_list(ST_Point(trace.lon, trace.lat)) AS points, 
            any_value(CAST(date AS string)) AS date
        FROM traces
        WHERE ownOp = 'JETBLUE AIRWAYS CORP' AND trace.aircraft.flight IS NOT null
        GROUP BY trace.aircraft.flight, date
    )
    SELECT flight, ST_MakeLine(points) AS geometry, date 
    FROM jbu_traces
    WHERE size(points) > 100
    """)
    
    jbu_flights_df.show()
    
    +--------+--------------------+----------+
    |  flight|            geometry|      date|
    +--------+--------------------+----------+
    |JBU100  |LINESTRING (-112....|2024-01-24|
    |JBU101  |LINESTRING (-80.1...|2024-01-21|
    |JBU1016 |LINESTRING (-77.3...|2024-01-20|
    |JBU1044 |LINESTRING (-80.1...|2024-01-10|
    |JBU1053 |LINESTRING (-73.7...|2024-01-03|
    |JBU1055 |LINESTRING (-71.0...|2024-01-03|
    |JBU1060 |LINESTRING (-77.4...|2024-01-12|
    |JBU1060 |LINESTRING (-77.4...|2024-01-26|
    |JBU1067 |LINESTRING (-73.7...|2024-01-17|
    |JBU1068 |LINESTRING (-80.0...|2024-01-24|
    |JBU1114 |LINESTRING (-112....|2024-01-17|
    |JBU1131 |LINESTRING (-80.2...|2024-01-22|
    |JBU1136 |LINESTRING (-83.3...|2024-01-15|
    |JBU1142 |LINESTRING (-80.0...|2024-01-27|
    |JBU1196 |LINESTRING (-80.1...|2024-01-03|
    |JBU1196 |LINESTRING (-80.1...|2024-01-27|
    |JBU1217 |LINESTRING (-81.6...|2024-01-15|
    |JBU1230 |LINESTRING (-86.8...|2024-01-22|
    |JBU1238 |LINESTRING (-97.6...|2024-01-28|
    |JBU1251 |LINESTRING (-73.7...|2024-01-10|
    +--------+--------------------+----------+
    only showing top 20 rows
    

    Flight routes

    Identify Potential For Weather Impacted Flights

    Now that we’ve extracted flight routes from the individual aircraft trace point observations we can begin to determine the potential for severe weather events to impact these flights based on our historical weather event data.

    Creating Uniform Flight Route Segments

    To analyze which flights have the most potential to be impacted based on historical weather events we first create 100km segments of each flight which will allow us to identify not only entire flight routes that are potentially impacted by weather events, but specific flight segments with the most risk of disruption.

    To do this we’ll make use of the ability to create a user defined function in WherobotsDB using the Shapely library. We’ll use Shapely’s line interpolate functionality to create equal length segments of each flight route.

    from shapely.geometry import LineString, MultiLineString
    from sedona.sql.types import GeometryType
    
    # Define a Shapely UDF to create a series of consistent length line segments for each flight
    def interpolate_lines(line: GeometryType()):
        try:
            distance_delta = 1.0 # ~100km
            points_count = int(line.length // distance_delta) + 1
            distances = (distance_delta * i for i in range(points_count))
            points = [line.interpolate(distance) for distance in distances] + [line.boundary[1]]
            line_strings = []
            for i in range(len(points) - 1):
                line_strings.append(LineString([points[i], points[i+1]]))
            return MultiLineString(line_strings)
        except:
            return None
    
    sedona.udf.register("interpolate_lines", interpolate_lines, GeometryType())
    

    We can now call our user defined function in a spatial SQL statement and leverage WherobotsDB’s parallel execution to run this UDF.

    split_df = sedona.sql("""
    SELECT flight, date, explode(ST_Dump(interpolate_lines(geometry))) AS geometry
    FROM jbu_flights
    """)
    
    split_df.createOrReplaceTempView("split_flights")
    split_df.show()
    
    +--------+----------+--------------------+
    |  flight|      date|            geometry|
    +--------+----------+--------------------+
    |JBU100  |2024-01-24|LINESTRING (-118....|
    |JBU100  |2024-01-24|LINESTRING (-117....|
    |JBU100  |2024-01-24|LINESTRING (-116....|
    |JBU100  |2024-01-24|LINESTRING (-115....|
    |JBU100  |2024-01-24|LINESTRING (-114....|
    |JBU100  |2024-01-24|LINESTRING (-113....|
    |JBU100  |2024-01-24|LINESTRING (-112....|
    |JBU100  |2024-01-24|LINESTRING (-111....|
    |JBU100  |2024-01-24|LINESTRING (-110....|
    |JBU100  |2024-01-24|LINESTRING (-109....|
    |JBU100  |2024-01-24|LINESTRING (-108....|
    |JBU100  |2024-01-24|LINESTRING (-107....|
    |JBU100  |2024-01-24|LINESTRING (-106....|
    |JBU100  |2024-01-24|LINESTRING (-105....|
    |JBU100  |2024-01-24|LINESTRING (-104....|
    |JBU100  |2024-01-24|LINESTRING (-103....|
    |JBU100  |2024-01-24|LINESTRING (-102....|
    |JBU100  |2024-01-24|LINESTRING (-102....|
    |JBU100  |2024-01-24|LINESTRING (-101....|
    |JBU100  |2024-01-24|LINESTRING (-100....|
    +--------+----------+--------------------+
    only showing top 20 rows
    
    Joining Flight Route Segments With Weather Events

    Now we perform a spatial join to identify flight segments that pass through areas that historically have a high frequency of severe weather events.

    severe_weather_split_df = sedona.sql("""
    SELECT flight, date, split_flights.geometry AS geometry, num 
    FROM split_flights
    JOIN weather_h3
    WHERE ST_Intersects(weather_h3.geometry, split_flights.geometry)
    """)
    
    severe_weather_split_df.show()
    

    This gives us a measure of which individual flight segments have the highest potential for weather impact.

    +--------+----------+--------------------+----+
    |  flight|      date|            geometry| num|
    +--------+----------+--------------------+----+
    |JBU2288 |2024-01-02|LINESTRING (-69.3...|1221|
    |JBU2288 |2024-01-02|LINESTRING (-70.3...|1774|
    |JBU2288 |2024-01-02|LINESTRING (-71.2...|  42|
    |JBU2288 |2024-01-02|LINESTRING (-71.2...|1192|
    |JBU2288 |2024-01-02|LINESTRING (-71.2...|1231|
    |JBU2288 |2024-01-02|LINESTRING (-72.2...| 672|
    |JBU2288 |2024-01-02|LINESTRING (-72.2...| 959|
    |JBU2288 |2024-01-02|LINESTRING (-73.2...| 652|
    |JBU2288 |2024-01-02|LINESTRING (-73.2...| 672|
    |JBU1053 |2024-01-03|LINESTRING (-73.7...| 416|
    |JBU2424 |2024-01-07|LINESTRING (-81.2...| 217|
    |JBU2424 |2024-01-07|LINESTRING (-81.2...| 467|
    |JBU2424 |2024-01-07|LINESTRING (-81.2...| 487|
    |JBU2424 |2024-01-07|LINESTRING (-81.2...| 611|
    |JBU2424 |2024-01-07|LINESTRING (-81.3...| 885|
    |JBU2424 |2024-01-07|LINESTRING (-81.3...| 722|
    |JBU2424 |2024-01-07|LINESTRING (-81.3...|2237|
    |JBU2424 |2024-01-07|LINESTRING (-81.3...| 722|
    |JBU2424 |2024-01-07|LINESTRING (-81.3...| 618|
    |JBU2424 |2024-01-07|LINESTRING (-81.3...| 607|
    +--------+----------+--------------------+----+
    only showing top 20 rows

    Impacted flight segments

    Flights With The Highest Potential For Weather Impact

    Now we can join the flight segments and their corresponding weather intensity impact scores with the full flight routes to visualize the flights with the highest potential for weather impact based on the historical weather events.

    full_flight_intensity_df = sedona.sql("""
    SELECT jbu_flights.flight, jbu_flights.geometry, jbu_flights.date, intensity
    FROM jbu_flights
    JOIN jbu_intensity ON jbu_flights.flight = jbu_intensity.flight AND jbu_flights.date = jbu_intensity.date
    """)
    
    map_view = SedonaKepler.create_map(full_flight_intensity_df.orderBy("intensity", ascending=False).limit(10), name="Flights")
    SedonaKepler.add_df(map_view, df=weather_h3_df, name="Weather")
    map_view

    Impacted flight routes

    Doppler Weather Raster Data

    So far we’ve been working with historical weather event data, lets now look at a simple example using more real-time weather radar data. Specifically, we’ll use NOAA’s NEXRAD Doppler weather radar surface reflectivity product to assess the impact of high precipitation at airports.

    We’ll load a GeoTiff raster image of the continental US to identify weather patterns, then join this raster data with the location of US airports.

    First, we load the raster image from S3 using the RS_FromPath function which will create an out-db raster in WherobotsDB.

    nexrad_raster_url = "s3://wherobots-examples/data/noaa/nexrad/n0q_202309231625.tif"
    nexrad_df = sedona.sql(f"SELECT RS_FromPath('{nexrad_raster_url}') AS raster")
    nexrad_df.createOrReplaceTempView("nexrad")
    nexrad_df.printSchema()
    root
     |-- raster: raster (nullable = true)

    Inspecting the meta data for this raster image we can see the spatial extent, spatial resolution, the CRS, and number of bands.

    sedona.sql("SELECT RS_MetaData(raster) from nexrad").show(truncate=False)
    
    +---------------------------------------------------------------------------+
    |rs_metadata(raster)                                                        |
    +---------------------------------------------------------------------------+
    |[-126.0025, 50.0025, 12200.0, 5400.0, 0.005, -0.005, 0.0, 0.0, 4326.0, 1.0]|
    +---------------------------------------------------------------------------+

    We can visualize the raster using RS_AsImage.

    htmlDf = sedona.sql("SELECT RS_AsImage(raster, 1200) AS raster_image FROM nexrad")
    SedonaUtils.display_image(htmlDf)

    Doppler weather radar

    Next, we’ll load the point geometries of airports from a Shapefile.

    spatialRdd = ShapefileReader.readToGeometryRDD(sedona.sparkContext, "s3a://wherobots-examples/data/ne_50m_airports")
    poi_df = Adapter.toDf(spatialRdd, sedona)
    
    poi_df = poi_df
    poi_df.show(5)
    poi_df.createOrReplaceTempView("airports")
    +--------------------+---------+----------+-----+----------------+------+--------+--------+---------+--------------------+---------+
    |            geometry|scalerank|featurecla| type|            name|abbrev|location|gps_code|iata_code|           wikipedia|natlscale|
    +--------------------+---------+----------+-----+----------------+------+--------+--------+---------+--------------------+---------+
    |POINT (113.935016...|        2|   Airport|major| Hong Kong Int'l|   HKG|terminal|    VHHH|      HKG|http://en.wikiped...|  150.000|
    |POINT (121.231370...|        2|   Airport|major|         Taoyuan|   TPE|terminal|    RCTP|      TPE|http://en.wikiped...|  150.000|
    |POINT (4.76437693...|        2|   Airport|major|        Schiphol|   AMS|terminal|    EHAM|      AMS|http://en.wikiped...|  150.000|
    |POINT (103.986413...|        2|   Airport|major|Singapore Changi|   SIN|terminal|    WSSS|      SIN|http://en.wikiped...|  150.000|
    |POINT (-0.4531566...|        2|   Airport|major| London Heathrow|   LHR| parking|    EGLL|      LHR|http://en.wikiped...|  150.000|
    +--------------------+---------+----------+-----+----------------+------+--------+--------+---------+--------------------+---------+
    only showing top 5 rows
    

    And now we’re ready to join the raster radar data to our vector airport locations. This operation retrieves the pixel value (in this case a precipitation index value) from the raster at the location of each airport. We’ll filter for any airports where this index value is above 100, indicating significant precipitation.

    
    airport_precip_df = sedona.sql("""
    SELECT name, iata_code, type, RS_Value(raster, geometry, 1) AS precip
    FROM airports
    JOIN nexrad 
    """)
    
    airport_precip_df.where("precip > 100").show(truncate=False)
    
    +-------------------+---------+-----+------+
    |name               |iata_code|type |precip|
    +-------------------+---------+-----+------+
    |Gen E L Logan Int'l|BOS      |major|108.0 |
    |Durham Int'l       |RDU      |major|138.0 |
    +-------------------+---------+-----+------+

    Based on this analysis we can see that Logan International Airport and Raleigh-Durham Airport are experiencing high precipitation events, which may potentially impact ground operations.

    Resources

    This blog post looked at combining aircraft trace observations from public ADS-B sensors with weather event and radar data to analyze the potential impact of severe weather events on flights.

    You can follow along with these examples by creating a free account on Wherobots Cloud and find the code for this example on GitHub.


    Learn more about this use case and Wherobots Cloud in a live demo Flight Analysis With Wherobots Cloud on June 20th at 8am Pacific time.

    Register now.

    Contributors

    • William Lyon

      Developer Relations At Wherobots