Rafael Monteiro/

Geospatial Intelligence for the best Coffee Shop location


Where to open a new coffee shop?

📖 Background

You are helping a client who owns coffee shops in Colorado. The company's coffee shops serve high-quality and responsibly sourced coffee, pastries, and sandwiches. They operate three locations in Fort Collins and want to expand into Denver.

Your client believes that the ideal location for a new store is close to affluent households, and the store appeals to the 20-35 year old demographic.

Your team collected geographical and demographic information about Denver's neighborhoods to assist the search. They also collected data for Starbucks stores in Denver. Starbucks and the new coffee shops do not compete for the same clients; the team included their location as a reference.

💾 The data

You have assembled information from three different sources (locations, neighborhoods, demographics):

Starbucks locations in Denver, Colorado
  • "StoreNumber" - Store Number as assigned by Starbucks
  • "Name" - Name identifier for the store
  • "PhoneNumber" - Phone number for the store
  • "Street 1, 2, and 3" - Address for the store
  • "PostalCode" - Zip code of the store
  • "Longitude, Latitude" - Coordinates of the store
Neighborhoods' geographical information
  • "NBHD_ID" - Neighborhood ID (matches the census information)
  • "NBHD_NAME" - Name of the statistical neighborhood
  • "Geometry" - Polygon that defines the neighborhood
Demographic information
  • "NBHD_ID" - Neighborhood ID (matches the geographical information)
  • "NBHD_NAME' - Nieghborhood name
  • "POPULATION_2010' - Population in 2010
  • "AGE_ " - Number of people in each age bracket (< 18, 18-34, 35-65, and > 65)
  • "NUM_HOUSEHOLDS" - Number of households in the neighborhood
  • "FAMILIES" - Number of families in the neighborhood
  • "NUM_HHLD_100K+" - Number of households with income above 100 thousand USD per year

Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
Census information from the United States Census Bureau. Publicly available information.

1 - Imports

!pip install geopandas
import pandas as pd
import numpy as np
import geopandas as gpd # '0.11.1'
import matplotlib.pyplot as plt
import seaborn as sns
denver = pd.read_csv('./data/denver.csv')
# Create a geodataframe from denver

# Transforming the Denver dataframe into a GeoDataframe by combining the coordinates
gdf = gpd.GeoDataFrame(denver, geometry = gpd.points_from_xy(denver.Longitude, denver.Latitude),crs = 4326)

neighborhoods = gpd.read_file('./data/neighborhoods.shp',crs = 4326) 
neighborhoods = neighborhoods.set_crs(epsg = 4326,allow_override=True)


2 Exploratory Data Analysis

2.1 Merge Starbucks locations in Denver and neighborhoods geodataframe

gdf['Neighborhoods'] = np.nan

#`rtree` or `pygeos` dont work for sjoin
# loop for merge the geodataframes
for nk,k in enumerate(gdf.geometry):
    for nj,j in enumerate(neighborhoods.geometry):
        if j.contains(k): # k.intersects(j)
        	gdf.iloc[nk,-1] = neighborhoods.iloc[nj,1]

2.1.1 Checking for missing values


On a visual check, it is noted that:

  • 4005 Chambers Road is in Montbello/Denver.
  • 2223 S. Monaco Parkway is in Goldsmith/Denver;

Checking the coordinates of the dataframe denver on google maps, it is observed that the coordinates are correct, maybe there is a difference between the georeferencing systems in relation to the dataframe neighborhoods.

2.2 Data Visualization

# Manual correction of values
gdf.loc[36,'Neighborhoods'] = 'Montbello'
gdf.loc[59,'Neighborhoods'] = 'Goldsmith'

gdf.dropna(subset='Neighborhoods', inplace = True)

ax = neighborhoods.plot(color='white', edgecolor='black', figsize=(15,8))
gdf.plot(ax=ax, color = 'red');

2.2.1 Visualization and indicating the starbucks density ratio by neighborhood area.

# use GeoSeries.to_crs() to project geometries to a planar CRS before using area method. (UTM EPSG:32714 WGS 84 / UTM zone 14S)
tost = neighborhoods.geometry.copy()
tost= tost.to_crs(epsg=32714)

# AREA (km2)
neighborhoods['AREA'] = tost.area/(10**6)

# Groupby Neighborhoods
count_Neigh = gdf.Neighborhoods.value_counts().to_frame()
count_Neigh.columns = ['Count_Starbucks']

# Merging the neighborhood dataframe with the starbucks store count dataframe. 
neighborhoods2 = neighborhoods.merge(count_Neigh, left_on = 'NBHD_NAME', how = 'left', right_index = True)
neighborhoods2.fillna(0,inplace = True)
neighborhoods2['Starbucks_density'] = neighborhoods2['Count_Starbucks']/neighborhoods2['AREA']
neighborhoods2.sort_values(by = 'NBHD_ID',inplace = True)

# Plots
                    legend = True,
                    cmap = "Greens");
plt.title(label = "Starbucks Count - A visualization of Denver's neighborhoods and the Starbucks store locations.");

                    legend = True,
                    cmap = "Greens");
plt.title(label = "Starbucks density.");

2.2.2 Kernel density estimation of Starbucks stores distribution.

ax = sns.kdeplot(gdf['Longitude'], gdf['Latitude'],shade = True, cmap='Purples',kind = 'kde',cbar=True)
plt.title(label = "Kernel density estimation of Starbucks stores distribution.")

neighborhoods2.plot(edgecolor='black',color = 'None', ax = ax)
neighborhoods[neighborhoods['NBHD_NAME'].isin(['Highland','Lincoln Park','Speer'])]