D4GC 2022 - Introduction to Workspace (Solution)
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Which locations have the longest commutes in Vancouver?

    Now that we have the basics under our belt, we will dive in and perform a quick analysis of Vancouver commute data. We will use both 2016 census data and geographical data downloaded from Vancouver's Open Data Portal (license).

    Our goal is to determine which areas in Vancouver residents report the longest commutes. We will start by importing some packages.

    # Import some useful packages
    import geopandas as gpd
    import pandas as pd
    import plotly.express as px
    # Create a pandas DataFrame from the commute data
    van_data = pd.read_csv("vancouver_commutes.csv", index_col="Area")
    
    # Preview the data
    van_data

    Currently, the data is in a difficult format to visualize. Let's create a column that represents the percentage of each region with a commute of 45 minutes or over.

    We will create a new column "Percent" by dividing the sum of the two final columns by the .sum() of all columns.

    # Create the new column
    van_data["Percent"] = (van_data["60 minutes and over"] + van_data["45 to 59 minutes"]) / van_data.sum(axis=1) * 100
    
    # Review the data
    van_data

    Great! Now we can load in the geojson file that we'll use to map out our commute data. To do this, we use the read_file() function to return a GeoDataFrame from the "vancouver_areas.geojson" file.

    We will also set the index using .set_index() to allow for easy merging of the data.

    # Read in the geojson file 
    van_boundaries = gpd.read_file("vancouver_areas.geojson")
    
    # Set the index
    van_boundaries.set_index("name", inplace=True)
    
    # Preview the file
    van_boundaries

    We will use the .merge() method to combine our commute and location data.

    # Merge the data
    van_df = van_boundaries.merge(van_data, 
                                  left_index=True, 
                                  right_index=True)
    
    # Preview the final DataFrame
    van_df

    Finally, we can visualize the data using a choropleth map in Plotly Express. We will make use of a number of parameters to ensure our plot is set up correctly:

    • The GeoDataFrame we are plotting.
    • geojson: the geometry we are using to construct the map.
    • locations: sets the location of the plot.
    • color_continuous_scale: the color scale of our areas.
    • fitbounds: constraints the plot to the locations.
    • color: a variable to color our regions by.
    • template: sets the aesthetics of the plot.
    • title: add a descriptive title.

    We finally use .update_geos() to disable the ugly frame of the plot and then .show() to render our figure!

    # Create the figure
    fig = px.choropleth(van_df, 
                        geojson=van_df.geometry, 
                        locations=van_df.index,
                        color_continuous_scale=["white", "#8F2800"],
                        fitbounds="locations",
                        color="Percent",
                        template="plotly_white",
                        title="<b>The suburbs have the worst commutes</b><br><sup>Percentage of Vancouver population with commute over 45 minutes</sup>"
                       )
    
    # Set the frame color to white
    fig.update_geos(framecolor="white")
    
    # Show the plot
    fig.show()