Daniel Wissa/

Competition - City Tree Species


Which tree species should the city plant?

📖 Background

You work for a nonprofit organization advising the planning department on ways to improve the quantity and quality of trees in New York City. The urban design team believes tree size (using trunk diameter as a proxy for size) and health are the most desirable characteristics of city trees.

The city would like to learn more about which tree species are the best choice to plant on the streets of Manhattan.

💾 The data

The team has provided access to the 2015 tree census and geographical information on New York City neighborhoods (trees, neighborhoods):

Tree Census
  • "tree_id" - Unique id of each tree.
  • "tree_dbh" - The diameter of the tree in inches measured at 54 inches above the ground.
  • "curb_loc" - Location of the tree bed in relation to the curb. Either along the curb (OnCurb) or offset from the curb (OffsetFromCurb).
  • "spc_common" - Common name for the species.
  • "status" - Indicates whether the tree is alive or standing dead.
  • "health" - Indication of the tree's health (Good, Fair, and Poor).
  • "root_stone" - Indicates the presence of a root problem caused by paving stones in the tree bed.
  • "root_grate" - Indicates the presence of a root problem caused by metal grates in the tree bed.
  • "root_other" - Indicates the presence of other root problems.
  • "trunk_wire" - Indicates the presence of a trunk problem caused by wires or rope wrapped around the trunk.
  • "trnk_light" - Indicates the presence of a trunk problem caused by lighting installed on the tree.
  • "trnk_other" - Indicates the presence of other trunk problems.
  • "brch_light" - Indicates the presence of a branch problem caused by lights or wires in the branches.
  • "brch_shoe" - Indicates the presence of a branch problem caused by shoes in the branches.
  • "brch_other" - Indicates the presence of other branch problems.
  • "postcode" - Five-digit zip code where the tree is located.
  • "nta" - Neighborhood Tabulation Area (NTA) code from the 2010 US Census for the tree.
  • "nta_name" - Neighborhood name.
  • "latitude" - Latitude of the tree, in decimal degrees.
  • "longitude" - Longitude of the tree, in decimal degrees.
Neighborhoods' geographical information
  • "ntacode" - NTA code (matches Tree Census information).
  • "ntaname" - Neighborhood name (matches Tree Census information).
  • "geometry" - Polygon that defines the neighborhood.

Tree census and neighborhood information from the City of New York NYC Open Data.

import pandas as pd
import geopandas as gpd
import numpy as np
import folium
import seaborn as sns
import matplotlib.pyplot as plt

    import colorcet as cc
except ModuleNotFoundError:
    !pip install colorcet
    import colorcet as cc
trees = pd.read_csv('data/trees.csv').drop_duplicates().dropna()

trees = trees.rename(columns = {'tree_id':'ID' 
                               ,'tree_dbh':'Diameter (in)'
                               ,'curb_loc':'Bed Location'
                               ,'root_stone':'Root Stones Issue'
                               ,'root_grate':'Root Grate Issue'
                               ,'root_other':'Other Root Issues'
                               ,'trunk_wire':'Trunk Wire Issue'
                               ,'trnk_light':'Trunk Lightning Issue'
                               ,'trnk_other':'Other Trunk Issues'
                               ,'brch_light':'Branch Lightning Issue'
                               ,'brch_shoe':'Branch Shoes Issue'
                               ,'brch_other':'Other Branch Issues'
                               ,'nta':'Neighborhood Code'
                               ,'nta_name':'Neighborhood Name'

trees.Specie = trees.Specie.str.capitalize()

neighborhoods = gpd.read_file('data/nta.shp')

1. What are the most common tree species in Manhattan?

First we need to filter our trees data to only get trees from Manhattan but we need to check which neighborhoods are included in our data.

trees['Neighborhood Code'].unique()

As seen from the unique code values, It is clear that all the trees we have are already located in Manhattan so it's safe to proceed with the analysis

fig, ax = plt.subplots(figsize = (12,36))
df = trees.Specie.value_counts().to_frame()
ax = sns.barplot(data=df, x='Specie', y=df.index)
ax.set_xlabel("Tree Count", fontsize=18, labelpad=16)
ax.set_ylabel("Tree Species", fontsize=18, labelpad=16)
ax.set_title("Tree Species by Count in Manhattan", fontsize=28, pad=24)

for p in ax.patches:
    ax.annotate(int(p.get_width()) , (p.get_width()+88, p.get_y()+0.64), fontsize=10)

It was clear from the analysis that the Honeylocust tree was the most popular one among Manhattan while some other trees are only located once which puts them in the endangered zone.

2. Which are the neighborhoods with the most trees?

Now we need to identify the neighborhoods with the highest and lowest tree counts.
To do so, a dynamic map was drawn where each neighborhood is colored according to how many trees are planted in there. A more reddish color indicates a less trees while a more greenish color indicates more trees.
If you hover a neighborhood, a tooltip pops up indicating the neighborhood name, the corresponding tree count and the most popular tree there.
Note: Apparently the map is not displayed on the Datacamp Notebook editor nor Publication so I attached below some pics to help visualize how it looks like. You can also duplicate the workspace and open it in the jupyter lab environment to view the map live. Remove the comment in the last line of code to display the map in your notebook

neighborhood_tree_count = trees['Neighborhood Name'].value_counts().to_frame()
neighborhood = neighborhoods[neighborhoods['boroname'] == 'Manhattan']
neighborhood = neighborhood[['geometry','ntaname']].set_index('ntaname').drop_duplicates()
most_popular = trees.groupby('Neighborhood Name').Specie.agg(pd.Series.mode).to_frame()
neighborhood_trees = neighborhood.merge(neighborhood_tree_count, left_index=True, right_index=True, how='left').fillna(0)
neighborhood_trees = neighborhood_trees.merge(most_popular, left_index=True, right_index=True, how='left').fillna('None')
neighborhood_trees.index.names = ['Neighborhood']
neighborhood_trees = neighborhood_trees.rename(columns = {'Neighborhood Name':'Tree Count', 'Specie':'Popular Tree'})

m = neighborhood_trees.explore(column='Tree Count' , tiles="CartoDB positron",cmap='RdYlGn', legend=True)


#display(m) #Remove the previous comment to run the map

Here we can see that the Upper West Side had the most number of trees planted which most were the Honeylocust Tree

  • AI Chat
  • Code