Intermediate Data Visualization with Seaborn
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Intermediate Data Visualization with Seaborn

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import datetime
    import os
    
    %matplotlib inline
    
    # play videos inside the notebook
    from IPython.display import Video, display
    def PlayVideo(file):
        try:
            display(Video(f'{file}', width=500, height=300))
        except:
            pass
    
    # Importing the course datasets
    bike_share = pd.read_csv('datasets/bike_share.csv')
    college_data = pd.read_csv('datasets/college_datav3.csv')
    college_data_partial = pd.read_csv('datasets/college_data_partial.csv')
    daily_show = pd.read_csv('datasets/daily_show_guests_cleaned.csv')
    insurance = pd.read_csv('datasets/insurance_premiums.csv')
    grants = pd.read_csv('datasets/schoolimprovement2010grants.csv', index_col=0)
    rent = pd.read_csv('datasets/rent.csv')

    1. Seaborn Introduction

    Introduction to the Seaborn library and where it fits in the Python visualization landscape.

    Introduction to Seaborn (Video)

    PlayVideo('1.Introduction to Seaborn.mp4')

    Seaborn foundation

    What library provides the foundation for pandas and Seaborn plotting?

    Possible Answers
    1. javascript
    2. matplotlib
    3. vega
    4. ggplot2
    Right Answer
    1. matplotlib

    matplotlib is the basis for many python plotting libraries. A basic understanding of matplotlib is helpful for better understanding Seaborn.

    Reading a csv file

    Before you analyze data, you will need to read the data into a pandas DataFrame. In this exercise, you will be looking at data from US School Improvement Grants in 2010. This program gave nearly $4B to schools to help them renovate or improve their programs.

    This first step in most data analysis is to import pandas and seaborn and read a data file in order to analyze it further.

    This course introduces a lot of new concepts, so if you ever need a quick refresher, download the Seaborn Cheat Sheet and keep it handy!

    Instructions
    1. Import pandas and seaborn using the standard naming conventions.
    2. The path to the csv file is stored in the grant_file variable.
    3. Use pandas to read the file.
    4. Store the resulting DataFrame in the variable df.
    # import all modules
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    grant_file = 'datasets/schoolimprovement2010grants.csv'
    # Read in the DataFrame
    df = pd.read_csv(grant_file)

    Comparing a histogram and displot

    The pandas library supports simple plotting of data, which is very convenient when data is already likely to be in a pandas DataFrame.

    Seaborn generally does more statistical analysis on data and can provide more sophisticated insight into the data. In this exercise, we will compare a pandas histogram vs the seaborn displot.

    Instructions
    1. Use the pandas' plot.hist() function to plot a histogram of the Award_Amount column.
    2. Use Seaborn's displot() function to plot a distribution plot of the same column.
    # Display pandas histogram
    df['Award_Amount'].plot.hist()
    plt.show()
    
    # Clear out the pandas histogram
    plt.clf()
    # Display a Seaborn displot
    sns.displot(df['Award_Amount'], kind='hist')
    plt.show()
    
    # Clear the displot
    plt.clf()
    Conclusion

    Notice how the pandas and Seaborn plots are very similar. Seaborn creates more appealing plots by default.

    Using the distribution plot (Video)

    PlayVideo('2.Using the distribution plot.mp4')

    Plot a histogram

    The displot() function will return a histogram by default. The displot() can also create a KDE or rug plot which are useful ways to look at the data. Seaborn can also combine these plots so you can perform more meaningful analysis.

    Instructions
    1. Create a displot for the data.
    2. Explicitly pass in the number 20 for the number of bins in the histogram.
    3. Display the plot using plt.show().