Skip to content
Data Manipulation with pandas
  • AI Chat
  • Code
  • Report
  • Spinner

    Data Manipulation with pandas

    Run the hidden code cell below to import the data used in this course.

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Import the four datasets
    avocado = pd.read_csv("datasets/avocado.csv")
    homelessness = pd.read_csv("datasets/homelessness.csv")
    temperatures = pd.read_csv("datasets/temperatures.csv")
    walmart = pd.read_csv("datasets/walmart.csv")

    Take Notes

    Add notes about the concepts you've learned and code cells with code you want to keep.

    Add your notes here

    # Add your code snippets here

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
    • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
    • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
    • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

    Inspecting a DataFrame

    # Print the head of the homelessness data
    print(homelessness.head())
    
    # Print information about homelessness
    print(homelessness.info()
          
    # Print the shape of homelessness
    print(homelessness.shape)
          
    # Print a description of homelessness
    print(homelessness.describe())
    
    # Print the values of homelessness
    print (homelessness.values)
    
    # Print the column index of homelessness
    print (homelessness.columns)
    
    # Print the row index of homelessness
    print (homelessness.index)

    Sorting rows

    # Sort homelessness by individuals
    homelessness_ind = homelessness.sort_values("individuals")
    
    # Print the top few rows
    print(homelessness_ind.head())
    
    # Sort homelessness by descending family members
    homelessness_fam = homelessness.sort_values("family_members", ascending = False)
    
    # Print the top few rows
    print (homelessness_fam.head())
    
    # Sort homelessness by region, then descending family members
    homelessness_reg_fam = homelessness.sort_values(["region", "family_members"], ascending=[True, False])
    
    # Print the top few rows
    print (homelessness_reg_fam.head())

    Subsetting columns

    # Select the individuals column
    individuals = homelessness["individuals"]
    
    # Print the head of the result
    print (individuals.head())
    
    # Select the state and family_members columns
    state_fam = homelessness[["state", "family_members"]]
    
    # Print the head of the result
    print (state_fam.head())
    
    # Select only the individuals and state columns, in that order
    ind_state = homelessness[["individuals", "state"]]
    
    # Print the head of the result
    print (ind_state.head())
    
    

    Subsetting rows

    # Filter for rows where individuals is greater than 10000
    ind_gt_10k = homelessness[homelessness["individuals"]>10000]
    
    # See the result
    print(ind_gt_10k)
    
    # Filter for rows where region is Mountain
    mountain_reg = homelessness[homelessness["region"] == "Mountain"]
    
    # See the result
    print (mountain_reg)
    
    # Filter for rows where family_members is less than 1000 
    # and region is Pacific
    fam_lt_1k_pac = homelessness[ (homelessness["family_members"] < 1000) & (homelessness["region"] == "Pacific") ]
    
    # See the result
    print(fam_lt_1k_pac)
    
    

    Subsetting rows by categorical variables