Mahmoud Magdy/

Data Manipulation with pandas


Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

when using .agg() , we create the function that performs the math formula ,then pass this func to agg

A custom IQR function

def iqr(column): return column.quantile(0.75) - column.quantile(0.25)

Print IQR of the temperature_c column


when working with .agg and more than argument function pass them as a list with out () .agg([func1,func2])

Make a list of cities to subset on

cities = ["Moscow", "Saint Petersburg"]

Subset temperatures using square brackets


Subset temperatures_ind using .loc[]

print(temperatures_ind.loc[["Moscow", "Saint Petersburg"]])

Get the worldwide mean temp by year

mean_temp_by_year = temp_by_country_city_vs_year.mean(axis="index")

Filter for the year that had the highest mean temp


Get the mean temp by city

mean_temp_by_city = temp_by_country_city_vs_year.mean(axis="columns")

Filter for the city that had the lowest mean temp


Two ways to summary the dataset ***1- .groupby()[].agg([]) ***2- subset in a variable data[data[""]==""] then, send the subset[""] as an argument to the np.mean (subset["column"]) "column" contains the numbers we want to summarize

  • AI Chat
  • Code