Your AI can now contextualize physical world data using Wherobots Spatial AI Coding Tools Learn More

Exploring the Foursquare Open Places Data with Wherobots

foursquare open places tutorial
foursquare open places tutorial

Introduction

At Wherobots, we’re excited to offer access to the Foursquare Open Places dataset through the Wherobots Global Hub. We maintain a pipeline to keep this dataset updated along with the Foursquare releases. In this tutorial, we’ll show you how to work with Foursquare’s Open Places dataset using Wherobots. We’ll demonstrate how to access, query, and visualize Points of Interest (POI) data, culminating in aggregating and visualizing the data as a choropleth map.

The overall process we will show is applicable to many different use cases; for this tutorial we will use it to analyze coffee shops across San Francisco and see how they are distributed by neighborhood.

If you would like to follow along, you can spin up a free instance here and access this notebook for yourself in the Wherobots Jupyter Notebook environment under the examples folder.

Let’s dive in.

What is the Foursquare Places Dataset?

Foursquare’s Places dataset is a comprehensive global collection of points of interest (POIs) containing over 100 million locations worldwide. What makes this dataset particularly valuable is that it’s continuously updated and verified through Foursquare’s Placemaker Tools, which enable community contributions to maintain accuracy.

The dataset includes details such as:

  • Business names and locations
  • Categories and classifications
  • Address information
  • Social media links
  • Opening dates and closure information

Tutorial

To follow along with this tutorial, you’ll need a Wherobots account. You can sign up for free at Wherobots.com.

Setting Up the Environment

First, let’s set up our Sedona context in Wherobots. This will start up our spark-based compute environment and give us access to the geospatial functions we need to work with the Foursquare dataset.

from sedona.spark import *

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

Exploring the Foursquare Data in Wherobots

The Foursquare data is already accessible in the Wherobots Open Data Catalog, so there’s no need to download or import it separately. Let’s check what tables are available:

sedona.sql("SHOW tables IN wherobots_open_data.foursquare").show(truncate=False)
+----------+----------+-----------+
|namespace |tableName |isTemporary|
+----------+----------+-----------+
|foursquare|categories|false      |
|foursquare|places    |false      |
+----------+----------+-----------+

There are two tables: places which contain the actual locations, and categories which provide the classification system. Let’s look at the schema of the places table:

sedona.table("wherobots_open_data.foursquare.places").printSchema()

You’ll get a result that looks something like this:

root
 |-- fsq_place_id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- address: string (nullable = true)
 |-- locality: string (nullable = true)
 |-- region: string (nullable = true)
 |-- postcode: string (nullable = true)
 |-- admin_region: string (nullable = true)
 |-- post_town: string (nullable = true)
 |-- po_box: string (nullable = true)
 |-- country: string (nullable = true)
 |-- date_created: string (nullable = true)
 |-- date_refreshed: string (nullable = true)
 |-- date_closed: string (nullable = true)
 |-- tel: string (nullable = true)
 |-- website: string (nullable = true)
 |-- email: string (nullable = true)
 |-- facebook_id: long (nullable = true)
 |-- instagram: string (nullable = true)
 |-- twitter: string (nullable = true)
 |-- fsq_category_ids: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- fsq_category_labels: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- placemaker_url: string (nullable = true)
 |-- geometry: geometry (nullable = true)
 |-- bbox: struct (nullable = true)
 |    |-- xmin: double (nullable = true)
 |    |-- ymin: double (nullable = true)
 |    |-- xmax: double (nullable = true)
 |    |-- ymax: double (nullable = true)

The schema includes fields like fsq_place_id, name, latitude, longitude, address, country, fsq_category_labels, and more. Importantly, it also includes a geometry field that we can use for spatial operations.

Working with Versions

Wherobots provides access to multiple versions of the Foursquare data. You can select a specific version using the VERSION AS OF clause:

sedona.sql("SELECT * FROM wherobots_open_data.foursquare.places VERSION AS OF 'dt=2025-02-06'").show(5, truncate=True)

If you don’t specify a version, you’ll get the latest one by default.

Basic Analysis of the Dataset

Let’s start by selecting the columns we’re interested in:

places_df = sedona.sql("""
    SELECT 
        fsq_place_id,
        name,
        geometry,
        fsq_category_labels,
        country,
        date_refreshed,
        date_created
    FROM wherobots_open_data.foursquare.places
    WHERE 
        date_closed IS NULL 
        AND name IS NOT NULL
        AND geometry IS NOT NULL
        AND country IS NOT NULL
""")

places_df.createOrReplaceTempView("places")

Now, let’s run a quick query to see what we’re working with and get some basic stats:

sedona.sql("""
    SELECT 
        COUNT(*) as total_places,
        COUNT(CASE WHEN name IS NOT NULL THEN 1 END) as has_name,
        COUNT(CASE WHEN address IS NOT NULL THEN 1 END) as has_address,
        COUNT(CASE WHEN fsq_category_labels IS NOT NULL THEN 1 END) as has_categories,
        COUNT(CASE WHEN date_closed IS NULL THEN 1 END) as still_open
    FROM places
""").show()

When you run the above query, you’ll get a result like this:

+------------+---------+-----------+--------------+----------+
|total_places| has_name|has_address|has_categories|still_open|
+------------+---------+-----------+--------------+----------+
|   104635092|104635092|   67380672|      92959858|  98436217|
+------------+---------+-----------+--------------+----------+

That’s over 104 million places in the dataset. About 67 million have addresses, and nearly 93 million have category labels. That’s an impressive dataset.

Filtering by Region

Let’s narrow our focus to a specific region. We can first use standard SQL to filter to the United States:

us_df = sedona.sql("""
    SELECT *
    FROM places
    WHERE country = 'US'
""")

This approach using standard SQL works well for cases where the region you want to filter for is already defined in the dataset. For example, the dataset has a country column, so we can filter by country.

But what if we wanted to filter by a geographic region that isn’t already defined in the dataset? For example, maybe we want to filter by census block groups, cities, or internal company polygons that we have defined for trade areas.

That’s where the power of spatial joins in Wherobots comes in. We can define any arbitrary polygon and use it to filter the data.

Note that the actual polygon is a very long WKT string, so we truncate it here in the example code. But the full WKT polygon is defined in the notebook itself for you to access. As an alternative you could use other open boundary datasets, such as within the Overture Divisions dataset and join the Foursquare places against those divisions.

# Define a polygon for San Francisco
SF = "MULTIPOLYGON (((-122.4773830027518 37.8110279985324, ...))"

# Filter to places within San Francisco
sf_df = sedona.sql(f"""
    SELECT * FROM places 
    WHERE ST_Contains(
        ST_GeomFromWKT('{SF}'), 
        geometry)
""")

Now we have just the places located within San Francisco.

I know that filtering the data for a single polygon may not be a huge task by itself, but where you really start to see the power of Wherobots is when this scales up. We could just as easily spatially join all 104 million places in the dataset into a million different polygons (buildings, cities, counties, senate districts, zip codes, etc…) based on which polygon a point falls within–and we could use that exact same ST_Contains function to do so.

Visualizing the Data

Wherobots includes built-in visualization capabilities through SedonaKepler. Let’s create a simple map of our San Francisco places:

sf_map = SedonaKepler.create_map(sf_df, "Places")
San Francisco places SedonaKepler

We can see from the image that there are POIs clear across San Francisco, with a dense concentration of places in the Financial District downtown.

Searching for Specific Places

Now say we want to search for specific places such as a brand or retail chain. For example, perhaps we want to find all Starbucks locations in San Francisco. To do so we can use the following query:

sbux_df = sedona.sql("""
    SELECT *
    FROM sf_places
    WHERE LOWER(name) = 'starbucks'
""")

sbux_map = SedonaKepler.create_map(sbux_df, "Starbucks in San Francisco")
Map of Starbucks in San Francisco

Exploring Categories

The Foursquare data uses a hierarchical category system. Let’s see what categories are most common in our San Francisco dataset:

sedona.sql("""
    SELECT 
        fsq_category_labels[0] as primary_category,
        COUNT(*) as count
    FROM sf_places
    GROUP BY fsq_category_labels[0]
    ORDER BY count DESC
    LIMIT 20
""").show(truncate=False)

Here are the top categories in San Francisco by count:

Primary CategoryCount
NULL8597
Business and Professional Services > Office6069
Business and Professional Services > Office > Tech Startup3615
Community and Government > Residential Building > Apartment or Condo2505
Health and Medicine > Physician > Doctor’s Office2439
Landmarks and Outdoors > Structure2048
Business and Professional Services > Health and Beauty Service > Hair Salon1492
Arts and Entertainment > Art Gallery1474
Travel and Transportation > Transportation Service > Public Transportation > Bus Line1238
Dining and Drinking > Restaurant1238

We can also filter to places that fall into a specific category. For example, instead of searching for Starbucks locations by name, let’s select all Points of Interest that are in the Coffee Shop category:

category_places = sedona.sql("""
    SELECT * FROM sf_places
    WHERE ARRAY_CONTAINS(fsq_category_labels, 'Dining and Drinking > Cafe, Coffee, and Tea House > Coffee Shop')
""")
Image of Starbucks in San Francisco

Here’s a quick look at the table that we get after running that query.

+--------------------+--------------------+--------------------+--------------------+-------+--------------+------------+
|        fsq_place_id|                name|            geometry| fsq_category_labels|country|date_refreshed|date_created|
+--------------------+--------------------+--------------------+--------------------+-------+--------------+------------+
|5622e819498ecbeed...|Four Barrel at TI...|POINT (-122.37331...|[Dining and Drink...|     US|    2024-10-26|  2015-10-18|
|49d6619ff964a520b...|        Cafe deStijl|POINT (-122.40036...|[Dining and Drink...|     US|    2022-07-29|  2009-04-03|
|49dbc22af964a520f...|Battery Street Co...|POINT (-122.40133...|[Dining and Drink...|     US|    2024-05-06|  2009-04-07|
|5b1599d604d1ae002...|           Good Mojo|POINT (-122.40049...|[Dining and Drink...|     US|    2024-10-26|  2018-06-04|
|4b02f2f9f964a5205...|           Starbucks|POINT (-122.40102...|[Dining and Drink...|     US|    2024-11-05|  2009-11-17|
|49fa3e4ff964a520d...|  Jackson Place Cafe|POINT (-122.40135...|[Dining and Drink...|     US|    2024-07-17|  2009-05-01|
|4d065736a26854819...|Réveille Coffee C...|POINT (-122.40035...|[Dining and Drink...|     US|    2024-10-20|  2010-12-13|
|4bafe772f964a5205...|            om bucks|POINT (-122.40090...|[Dining and Drink...|     US|    2023-04-04|  2010-03-28|
|4ab317e7f964a5207...|       Peet's Coffee|POINT (-122.40115...|[Dining and Drink...|     US|    2023-09-20|  2009-09-18|
|2af323a0d2d542feb...|  Aroma Espresso Bar|POINT (-122.40102...|[Dining and Drink...|     US|    2012-08-27|  2012-08-27|
+--------------------+--------------------+--------------------+--------------------+-------+--------------+------------+

Creating a Choropleth Map: Coffee Shops by Neighborhood

For a more advanced visualization, let’s create a choropleth map showing the number of coffee shops in each San Francisco neighborhood.

First we will need a dataset of polygons for the SF neighborhoods. Luckily, we have one from the SF open data portal. We’ll load these from a CSV file:

SF_NEIGHBORHOODS_URL = "s3://wherobots-examples/data/sf_neighborhoods.csv"

neighborhoods = (sedona.read.format('csv')
    .option('header', 'true')
    .option('delimiter', ',')
    .option('inferSchema', 'true')
    .load(SF_NEIGHBORHOODS_URL)
    )

For context, here’s the table that results after we load those neighborhood polygon boundaries.

+--------------------+--------------------+
|            the_geom|          neighborho|
+--------------------+--------------------+
|MULTIPOLYGON (((-...|            Seacliff|
|MULTIPOLYGON (((-...|      Haight Ashbury|
|MULTIPOLYGON (((-...|       Outer Mission|
|MULTIPOLYGON (((-...|        Inner Sunset|
|MULTIPOLYGON (((-...|Downtown/Civic Ce...|
|MULTIPOLYGON (((-...|     Diamond Heights|
|MULTIPOLYGON (((-...|           Lakeshore|
|MULTIPOLYGON (((-...|        Russian Hill|
|MULTIPOLYGON (((-...|          Noe Valley|
|MULTIPOLYGON (((-...| Treasure Island/YBI|
+--------------------+--------------------+

As you can see, there are two columns, the geometry column that contains the multi-polygon that defines the boundaries of the neighborhood and the neighborhood column that contains the name.

If you’ve spent any time in SF you’ll certainly recognize some of these names.

Now that we have the boundaries we’ll use a spatial join to count coffee shops in each neighborhood:

neighborhood_agg = sedona.sql("""
    SELECT 
        ST_GeomFromWKT(n.the_geom) AS geometry,
        n.neighborho AS neighborhood,
        COUNT(*) as location_count,
        collect_list(p.name) AS coffee_shops
    FROM category_places p
    JOIN neighborhoods n
    ON ST_CONTAINS(ST_GeomFromWKT(n.the_geom), p.geometry)
    GROUP BY n.the_geom, n.neighborho
    """)

This gives us the following table where we can see the neighborhood name, its polygon geometry, the count of coffee shops contained in that neighborhood, and an array of the names of the coffee shops.

+--------------------+--------------------+--------------+--------------------+
|            geometry|        neighborhood|location_count|        coffee_shops|
+--------------------+--------------------+--------------+--------------------+
|MULTIPOLYGON (((-...| Treasure Island/YBI|             1|[Four Barrel at T...|
|MULTIPOLYGON (((-...|        Potrero Hill|            25|[Starbucks, WFM C...|
|MULTIPOLYGON (((-...|     South of Market|           119|[W6 Coffee Bar, S...|
|MULTIPOLYGON (((-...|             Bayview|            20|[The Happy Vegan,...|
|MULTIPOLYGON (((-...|  Financial District|           186|[Starbucks, Jacks...|
|MULTIPOLYGON (((-...|   Visitacion Valley|             4|[Joe Leland, Miss...|
|MULTIPOLYGON (((-...|           Chinatown|            11|[Cafe Vivo, Latte...|
|MULTIPOLYGON (((-...|Downtown/Civic Ce...|            98|[Chai Bar, Dignit...|
|MULTIPOLYGON (((-...|         North Beach|            30|[Cafe deStijl, Ba...|
|MULTIPOLYGON (((-...|            Nob Hill|            22|[Cafe Mozart, Gal...|
+--------------------+--------------------+--------------+--------------------+

Finally we have what we need to create our visualization. Time to create our choropleth map.

Note here that I am using a custom map config to set the color scale, style, and other properties.

map = SedonaKepler.create_map(df=neighborhood_agg, name="Coffee Shop Count", config=map_config)
foursquare open places tutorial

The resulting visualization reveals interesting patterns. Downtown San Francisco (the Financial District) has the highest concentration with 186 coffee shops, while areas like the Presidio are relatively sparse with only around 10 options. The Inner Richmond neighborhood shows about 37 coffee shops.

With these live Sedona Kepler maps, you can hover over the neighborhood you’re interested in to see the full list of coffee shops in that area.

That wraps up our tutorial on working with Foursquare Places data in Wherobots. I hope you enjoyed learning about these powerful geospatial capabilities.

Before we wrap up, let’s take a quick look at what makes this dataset from Foursquare unique.

What Makes Foursquare’s Places Data Unique?

Unlike some other POI datasets, Foursquare’s Places data is continually updated through their Placemaker Tools. These web-based tools allow community members to:

  • Add new venues
  • Update existing information
  • Validate place data

This community-driven approach helps ensure the data remains current and accurately reflects real-world changes.

Foursquare Placemaker Tools

Conclusion

The Foursquare Places dataset combined with Wherobots’ powerful geospatial capabilities offers an excellent foundation for location-based analytics. Whether you’re studying urban patterns, analyzing business distributions, or planning a coffee crawl through San Francisco, these tools make it easy to extract meaningful insights from spatial data.

Ready to explore the Foursquare Places data for yourself? Sign up for a free Wherobots account and try out this notebook!

Bonus: Video Walk Through

By the way, a couple months ago I released this video where I walk through this exact use case in one of our Wherobots demo notebooks. Feel free to check it out.

Get started for free