Intermediate Data Visualization with Seaborn
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import os
%matplotlib inline
# play videos inside the notebook
from IPython.display import Video, display
def PlayVideo(file):
try:
display(Video(f'{file}', width=500, height=300))
except:
pass
# Importing the course datasets
bike_share = pd.read_csv('datasets/bike_share.csv')
college_data = pd.read_csv('datasets/college_datav3.csv')
college_data_partial = pd.read_csv('datasets/college_data_partial.csv')
daily_show = pd.read_csv('datasets/daily_show_guests_cleaned.csv')
insurance = pd.read_csv('datasets/insurance_premiums.csv')
grants = pd.read_csv('datasets/schoolimprovement2010grants.csv', index_col=0)
rent = pd.read_csv('datasets/rent.csv')
1. Seaborn Introduction
Introduction to the Seaborn library and where it fits in the Python visualization landscape.
Introduction to Seaborn (Video)
PlayVideo('1.Introduction to Seaborn.mp4')
Seaborn foundation
What library provides the foundation for pandas and Seaborn plotting?
Possible Answers
- javascript
- matplotlib
- vega
- ggplot2
Right Answer
- matplotlib
matplotlib is the basis for many python plotting libraries. A basic understanding of matplotlib is helpful for better understanding Seaborn.
Reading a csv file
Before you analyze data, you will need to read the data into a pandas DataFrame. In this exercise, you will be looking at data from US School Improvement Grants in 2010. This program gave nearly $4B to schools to help them renovate or improve their programs.
This first step in most data analysis is to import pandas
and seaborn
and read a data file in order to analyze it further.
This course introduces a lot of new concepts, so if you ever need a quick refresher, download the Seaborn Cheat Sheet and keep it handy!
Instructions
- Import
pandas
andseaborn
using the standard naming conventions. - The path to the csv file is stored in the
grant_file
variable. - Use
pandas
to read the file. - Store the resulting DataFrame in the variable
df
.
# import all modules
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
grant_file = 'datasets/schoolimprovement2010grants.csv'
# Read in the DataFrame
df = pd.read_csv(grant_file)
Comparing a histogram and displot
The pandas
library supports simple plotting of data, which is very convenient when data is already likely to be in a pandas
DataFrame.
Seaborn generally does more statistical analysis on data and can provide more sophisticated insight into the data. In this exercise, we will compare a pandas
histogram vs the seaborn
displot.
Instructions
- Use the pandas'
plot.hist()
function to plot a histogram of theAward_Amount
column. - Use Seaborn's
displot()
function to plot a distribution plot of the same column.
# Display pandas histogram
df['Award_Amount'].plot.hist()
plt.show()
# Clear out the pandas histogram
plt.clf()
# Display a Seaborn displot
sns.displot(df['Award_Amount'], kind='hist')
plt.show()
# Clear the displot
plt.clf()
Conclusion
Notice how the pandas and Seaborn plots are very similar. Seaborn creates more appealing plots by default.
Using the distribution plot (Video)
PlayVideo('2.Using the distribution plot.mp4')
Plot a histogram
The displot()
function will return a histogram by default. The displot()
can also create a KDE or rug plot which are useful ways to look at the data. Seaborn can also combine these plots so you can perform more meaningful analysis.
Instructions
- Create a
displot
for the data. - Explicitly pass in the number
20
for the number ofbins
in the histogram. - Display the plot using
plt.show()
.