Vaibhav Sachdeva
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
Sign up
Introduction to Data Visualization with Matplotlib
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Introduction to Data Visualization with Matplotlib

    👋 Welcome to your workspace! Here, you can write and run Python code and add text in Markdown. Below, we've imported the datasets from the course Introduction to Data Visualization with Matplotlib as DataFrames as well as the packages used in the course. This is your sandbox environment: analyze the course datasets further, take notes, or experiment with code!

    # Importing course packages; you can add more too!
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Importing course datasets as DataFrames
    climate_change = pd.read_csv('datasets/climate_change.csv', parse_dates=["date"], index_col="date")
    medals = pd.read_csv('datasets/medals_by_country_2016.csv', index_col=0)
    summer_2016 = pd.read_csv('datasets/summer2016.csv')
    austin_weather = pd.read_csv("datasets/austin_weather.csv", index_col="DATE")
    weather = pd.read_csv("datasets/seattle_weather.csv", index_col="DATE")
    # Some pre-processing on the weather datasets, including adding a month column
    seattle_weather = weather[weather["STATION"] == "USW00094290"] 
    month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] 
    seattle_weather["MONTH"] = month 
    austin_weather["MONTH"] = month
    
    austin_weather.head() # Display the first five rows of this DataFrame

    Don't know where to start?

    Try completing these tasks:

    • Using austin_weather and seattle_weather, create a Figure with an array of two Axes objects that share a y-axis range (MONTHS in this case). Plot Seattle's and Austin's MLY-TAVG-NORMAL (for average temperature) in the top Axes and plot their MLY-PRCP-NORMAL (for average precipitation) in the bottom axes. The cities should have different colors and the line style should be different between precipitation and temperature. Make sure to label your viz!
    • Using climate_change, create a twin Axes object with the shared x-axis as time. There should be two lines of different colors not sharing a y-axis: co2 and relative_temp. Only include dates from the 2000s and annotate the first date at which co2 exceeded 400.
    • Create a scatter plot from medals comparing the number of Gold medals vs the number of Silver medals with each point labeled with the country name.
    • Explore if the distribution of Age varies in different sports by creating histograms from summer_2016.
    • Try out the different Matplotlib styles available and save your visualizations as a PNG file.

    Following a review of basic plotting with Matplotlib, this chapter delves into customizing plots using Matplotlib. This includes overlaying plots, making subplots, controlling axes, adding legends and annotations, and using different plot styles.

    Multiple plots on single axis

    The data set here comes from records of undergraduate degrees awarded to women in a variety of fields from 1970 to 2011. You can compare trends in degrees most easily by viewing two curves on the same set of axes.

    Here, three NumPy arrays have been pre-loaded for you: year (enumerating years from 1970 to 2011 inclusive), physical_sciences (representing the percentage of Physical Sciences degrees awarded to women each in corresponding year), and computer_science (representing the percentage of Computer Science degrees awarded to women in each corresponding year).

    You will issue two plt.plot() commands to draw line plots of different colors on the same set of axes. Here, year represents the x-axis, while physical_sciences and computer_science are the y-axes.

    year_list = [x for x in range(1970, 2012)]
    year = np.array(year_list)
    year
    physical_sciences = np.array([13.8, 14.9, 14.8, 16.5, 18.2, 19.1, 20. , 21.3, 22.5, 23.7, 24.6,
           25.7, 27.3, 27.6, 28. , 27.5, 28.4, 30.4, 29.7, 31.3, 31.6, 32.6,
           32.6, 33.6, 34.8, 35.9, 37.3, 38.3, 39.7, 40.2, 41. , 42.2, 41.1,
           41.7, 42.1, 41.6, 40.8, 40.7, 40.7, 40.7, 40.2, 40.1])
    computer_science = np.array([13.6, 13.6, 14.9, 16.4, 18.9, 19.8, 23.9, 25.7, 28.1, 30.2, 32.5,
           34.8, 36.3, 37.1, 36.8, 35.7, 34.7, 32.4, 30.8, 29.9, 29.4, 28.7,
           28.2, 28.5, 28.5, 27.5, 27.1, 26.8, 27. , 28.1, 27.7, 27.6, 27. ,
           25.1, 22.2, 20.6, 18.6, 17.6, 17.8, 18.1, 17.6, 18.2])
    # Plot in blue the % of degrees awarded to women in the Physical Sciences
    plt.plot(year, physical_sciences, color='blue')
    
    # Plot in red the % of degrees awarded to women in Computer Science
    plt.plot(year, computer_science, color = 'red')
    
    # Display the plot
    plt.show()

    Using axes()

    Rather than overlaying line plots on common axes, you may prefer to plot different line plots on distinct axes. The command plt.axes() is one way to do this (but it requires specifying coordinates relative to the size of the figure).

    Here, you have the same three arrays year, physical_sciences, and computer_science representing percentages of degrees awarded to women over a range of years. You will use plt.axes() to create separate sets of axes in which you will draw each line plot.

    In calling plt.axes([xlo, ylo, width, height]), a set of axes is created and made active with lower corner at coordinates (xlo, ylo) of the specified width and height. Note that these coordinates can be passed to plt.axes() in the form of a list or a tuple.

    The coordinates and lengths are values between 0 and 1 representing lengths relative to the dimensions of the figure. After issuing a plt.axes() command, plots generated are put in that set of axes.

    # Create plot axes for the first line plot
    plt.axes([0.05, 0.05, 0.425, 0.9])
    
    # Plot in blue the % of degrees awarded to women in the Physical Sciences
    plt.plot(year, physical_sciences, color='blue')
    
    # Create plot axes for the second line plot
    plt.axes([0.525, 0.05, 0.425, 0.9])
    # Plot in red the % of degrees awarded to women in Computer Science
    plt.plot(year, computer_science, color = 'red')
    
    # Display the plot
    plt.show()