Data Exploration with Python
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Introduction

    Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present more unique, personalized way of experiencing the world. This dataset describes the listing activity and metrics in NYC, NY for 2019. The original source for this data can be found on this http://insideairbnb.com/. In this article, we will explore this data to gain valuable insights into listings.

    Let us start by loading up the necessary packages.

    # Read data
    import pandas as pd
    
    # Visualize data
    import matplotlib.pyplot as plt
    import seaborn as sns
    import plotly.express as px
    
    # Configure visualizations
    plt.rcParams['figure.figsize'] = [8, 4]
    sns.set_theme(style='darkgrid')

    Read Data

    We can read the data from the CSV file into a dataframe. This data has already been cleaned up nicely and hence we don't need to do any pre-processing before analyzing it.

    listings = pd.read_csv('data/AB_NYC_2019.csv')
    listings.head(100)

    Explore Data

    There are many different questions we can explore with this data. Let us start by looking at the top neighborhoods with listings.

    by_neighbourhood = listings.groupby(['neighbourhood'], as_index=False)['id'].count().rename(columns = {'id': 'nb_bookings'})
    top_10_neighbourhoods = by_neighbourhood.sort_values(by = ['nb_bookings'], ascending=False).head(10)
    top_10_neighbourhoods

    Let us visualize the data using a horizontal bar plot.

    sns.barplot(data=top_10_neighbourhoods, y='neighbourhood', x='nb_bookings').set_title('Top 10 Neighbourhoods');

    We can also visualize the top neighbourhoods as an interactive plot as well, using plotly. Williamsburg seems to have the most listings followed by Bedford-Stuyvesant and Harlem.

    fig = px.bar(top_10_neighbourhoods, x="nb_bookings", y="neighbourhood")
    fig.show(config={"displayModeBar": False})

    How about the distribution of prices across neighbourhoods? Rather than looking at neighbourhoods, we will focus our attention on neighbourhood groups.

    listings_lt_500 = listings[listings.price <= 500]
    fig_price = px.violin(
        listings_lt_500, 
        x="neighbourhood_group", 
        y="price", 
        color='neighbourhood_group'
    )
    fig_price.show(config={"displayModeBar": False})