Skip to content
Data Manipulation with pandas
  • AI Chat
  • Code
  • Report
  • Spinner

    Data Manipulation with pandas

    👋 Welcome to your new workspace! Here, you can experiment with the data you used in Data Manipulation with pandas and practice your newly learned skills with some challenges. You can find out more about DataCamp Workspace here.

    On average, we expect users to take approximately 30 minutes to complete the content in this workspace. However, you are free to experiment and practice in it as long as you would like!

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Import the four datasets
    avocado = pd.read_csv("datasets/avocado.csv")
    homelessness = pd.read_csv("datasets/homelessness.csv")
    temperatures = pd.read_csv("datasets/temperatures.csv")
    walmart = pd.read_csv("datasets/walmart.csv")
    
    # Print the first DataFrame
    avocado

    2. Write Code

    After running the cell above, you have created four pandas DataFrames: avocado, homelessness, temperatures, and walmart.

    Add code to the code cells below to try one (or more) of the following challenges:

    1. Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
    2. What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
    3. Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
    4. Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

    Be sure to check out the Answer Key at the end to see one way to solve each problem. Did you try something similar?

    Reminder: To execute the code you add to a cell, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell.

    # 1. Print the highest weekly sales for each department
    
    # 2. What was the total `nb_sold` of organic avocados in 2017?
    
    # 3. Create a bar plot of the number of homeless people by region
    
    # 4. Create a line plot of temperatures in Toronto and Rome
    

    3. Next Steps

    Feeling confident about your skills? Continue on to Joining Data with pandas! This course will teach you how to combine multiple datasets, an essential skill on the road to becoming a data scientist!

    4. Answer Key

    Below are potential solutions to the challenges shown above. Try them out and see how they compare to how you approached the problem!

    # 1. Print the highest weekly sales for each department
    department_sales = walmart.groupby("department")[["weekly_sales"]].max()
    best_departments = department_sales.sort_values(by="weekly_sales", ascending=False)
    best_departments.head()
    # 2. What was the total `nb_sold` of organic avocados in 2017?
    avocado_2017 = avocado.set_index("date").sort_index().loc["2017":"2018"]
    avocado_organic_2017 = avocado_2017.loc[(avocado_2017["type"] == "organic")]
    avocado_organic_2017["nb_sold"].sum()
    # 3. Create a bar plot of the number of homeless people by region
    homelessness_by_region = (
        homelessness.groupby("region")["individuals"].sum().sort_values()
    )
    homelessness_by_region.plot(kind="barh")
    plt.title("Total Number of Homeless People by Region")
    plt.xlabel("Number")
    plt.ylabel("Region")
    plt.show()
    # 4. Create a line plot of temperatures in Toronto and Rome
    toronto = temperatures[temperatures.city == "Toronto"]
    rome = temperatures[temperatures.city == "Rome"]
    toronto.groupby("date")["avg_temp_c"].mean().plot(kind="line", color="blue")
    rome.groupby("date")["avg_temp_c"].mean().plot(kind="line", color="red")
    plt.title("Toronto and Rome Average Temperature (C)")
    plt.xlabel("Date")
    plt.ylabel("Temperature")
    plt.legend(labels=["Toronto", "Rome"])
    plt.show()