Introduction to Data Visualization with Matplotlib
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Introduction to Data Visualization with Matplotlib

    Run the hidden code cell below to import the data used in this course.

    # Importing the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Importing the course datasets 
    climate_change = pd.read_csv('datasets/climate_change.csv', parse_dates=["date"], index_col="date")
    medals = pd.read_csv('datasets/medals_by_country_2016.csv', index_col=0)
    summer_2016 = pd.read_csv('datasets/summer2016.csv')
    austin_weather = pd.read_csv("datasets/austin_weather.csv", index_col="DATE")
    weather = pd.read_csv("datasets/seattle_weather.csv", index_col="DATE")
    
    # Some pre-processing on the weather datasets, including adding a month column
    seattle_weather = weather[weather["STATION"] == "USW00094290"] 
    month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] 
    seattle_weather["MONTH"] = month 
    austin_weather["MONTH"] = month
    

    Take Notes

    Add notes about the concepts you've learned and code cells with code you want to keep.

    definition Matplotlib.pyplot

    _Matplotlib is a popular data visualization library in python that allow you tocreate awide range of plots chart and graphs Using pyplot you can create line, histogram, scatter,bar plots and many other type of visualizations You can add labels, title and anther annotation to your plots. Pyplot can work with other libraries such as numPy, Pandas and Scipy this make it is essential tool of data scientists and data analysts.

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • Using austin_weather and seattle_weather, create a Figure with an array of two Axes objects that share a y-axis range (MONTHS in this case). Plot Seattle's and Austin's MLY-TAVG-NORMAL (for average temperature) in the top Axes and plot their MLY-PRCP-NORMAL (for average precipitation) in the bottom axes. The cities should have different colors and the line style should be different between precipitation and temperature. Make sure to label your viz!
    • Using climate_change, create a twin Axes object with the shared x-axis as time. There should be two lines of different colors not sharing a y-axis: co2 and relative_temp. Only include dates from the 2000s and annotate the first date at which co2 exceeded 400.
    • Create a scatter plot from medals comparing the number of Gold medals vs the number of Silver medals with each point labeled with the country name.
    • Explore if the distribution of Age varies in different sports by creating histograms from summer_2016.
    • Try out the different Matplotlib styles available and save your visualizations as a PNG file.
    import matplotlib.pyplot as plt
    import pandas as pd
    from datetime import datetime
    %matplotlib inline

    Explor your data

    summer2016 = pd.read_csv("datasets/summer2016.csv")
    summer2016
    climate_change = pd.read_csv('datasets/climate_change.csv')
    climate_change
    climate_change.dtypes
    # Convert object date to datetime
    climate_change['date'] = pd.to_datetime(climate_change['date'])
    climate_change
    # Create new columns year, month and day
    climate_change['years'] = climate_change['date'].dt.year
    climate_change['months'] = climate_change['date'].dt.month
    climate_change['days'] = climate_change['date'].dt.day
    climate_change

    Adding style

    htttps://matplotlib.org/gallery/style-sheets/style-sheets-refer

    plt.style.use('default')
    relative_greater_than_one_point = climate_change[climate_change['relative_temp']>0.1]
    # Add figure and axes
    fig, ax = plt.subplots()
    ax.plot(relative_greater_than_one_point['date'], relative_greater_than_one_point['relative_temp'])
    plt.show()