Workspace
Aditya Raghunandan Kodavanti/

Introduction to Data Visualization with Matplotlib

0
Beta
Spinner

Introduction to Data Visualization with Matplotlib

Run the hidden code cell below to import the data used in this course.

# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Importing the course datasets 
climate_change = pd.read_csv('datasets/climate_change.csv', parse_dates=["date"], index_col="date")
medals = pd.read_csv('datasets/medals_by_country_2016.csv', index_col=0)
summer_2016 = pd.read_csv('datasets/summer2016.csv')
austin_weather = pd.read_csv("datasets/austin_weather.csv", index_col="DATE")
weather = pd.read_csv("datasets/seattle_weather.csv", index_col="DATE")

# Some pre-processing on the weather datasets, including adding a month column
seattle_weather = weather[weather["STATION"] == "USW00094290"] 
month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] 
seattle_weather["MONTH"] = month 
austin_weather["MONTH"] = month

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Using austin_weather and seattle_weather, create a Figure with an array of two Axes objects that share a y-axis range (MONTHS in this case). Plot Seattle's and Austin's MLY-TAVG-NORMAL (for average temperature) in the top Axes and plot their MLY-PRCP-NORMAL (for average precipitation) in the bottom axes. The cities should have different colors and the line style should be different between precipitation and temperature. Make sure to label your viz!
  • Using climate_change, create a twin Axes object with the shared x-axis as time. There should be two lines of different colors not sharing a y-axis: co2 and relative_temp. Only include dates from the 2000s and annotate the first date at which co2 exceeded 400.
  • Create a scatter plot from medals comparing the number of Gold medals vs the number of Silver medals with each point labeled with the country name.
  • Explore if the distribution of Age varies in different sports by creating histograms from summer_2016.
  • Try out the different Matplotlib styles available and save your visualizations as a PNG file.
#1
fig,ax=plt.subplots(2,1,sharey=True)
ax[0].plot(austin_weather["MONTH"], austin_weather["MLY-TAVG-NORMAL"], color="blue", linestyle="-")
ax[0].plot(seattle_weather["MONTH"], seattle_weather["MLY-TAVG-NORMAL"], color="red", linestyle="-")
ax[0].set_title("Average Temperature")
ax[0].set_xlabel("Month")
ax[0].set_ylabel("Temperature (F)")

ax[1].plot(austin_weather["MONTH"], austin_weather["MLY-PRCP-NORMAL"], color="blue", linestyle=":")
ax[1].plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"], color="red", linestyle=":")
ax[1].set_title("Average Precipitation")
ax[1].set_xlabel("Month")
ax[1].set_ylabel("Precipitation (in)")

plt.xlabel("Month")
plt.ylabel("Temperature (F)")
plt.title("Weather Data for Austin and Seattle")

plt.show()
climate_change
#2
# Create a figure with two axes that share a x-axis
fig, axes = plt.subplots(2,1, sharex=True)

# Plot co2 and relative_temp on the twin axes
axes[0].plot(climate_change.index, climate_change["co2"], label="CO2")
axes[1].plot(climate_change.index, climate_change["relative_temp"], label="Relative Temperature")

# Annotate the first date at which co2 exceeded 400
axes[0].annotate("CO2 exceeded 400 ppm", xy=(2014, 400), xytext=(2010, 420))

# Label the axes
axes[0].set_xlabel("Date")
axes[0].set_ylabel("CO2 (ppm)")
axes[1].set_xlabel("Date")
axes[1].set_ylabel("Relative Temperature (°C)")

# Add a legend
plt.legend()

# Show the figure
plt.show()
medals
#3
plt.scatter(medals["Gold"], medals["Silver"], c="blue", label="Country")

# Label each point with the country name
for i, country in enumerate(medals.index):
    plt.annotate(country, (medals["Gold"][i], medals["Silver"][i]), xytext=(5, 5), textcoords="offset points", fontsize=14)

# Label the axes
plt.xlabel("Gold Medals")
plt.ylabel("Silver Medals")

# Add a legend
plt.legend()

# Show the figure
plt.show()
summer_2016
#4
# Create a histogram of the age distribution for each sport
for sport in summer_2016['Sport']:
    plt.hist(summer_2016[summer_2016['Sport']==sport]["Age"], bins=50)
    plt.title(sport)
    plt.xlabel("Age")
    plt.ylabel("Frequency")

plt.show()
  • AI Chat
  • Code