this is the nav!
Workspace
Mahmoud Magdy/

# Data Manipulation with pandas

0
Beta

## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.

### Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

`.mfe-app-workspace-jfrv3u{font-size:13px;line-height:20px;font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;}`# Add your code snippets here``

### Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

• Print the highest weekly sales for each `department` in the `walmart` DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
• What was the total `nb_sold` of organic avocados in 2017 in the `avocado` DataFrame? If you're stuck, try reviewing this video.
• Create a bar plot of the total number of homeless people by region in the `homelessness` DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
• Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

when using .agg() , we create the function that performs the math formula ,then pass this func to agg

## A custom IQR function

def iqr(column): return column.quantile(0.75) - column.quantile(0.25)

## Print IQR of the temperature_c column

print(sales["temperature_c"].agg(iqr))

when working with .agg and more than argument function pass them as a list with out () .agg([func1,func2])

## Make a list of cities to subset on

cities = ["Moscow", "Saint Petersburg"]

## Subset temperatures using square brackets

print(temperatures[temperatures["city"].isin(cities)])

## Subset temperatures_ind using .loc[]

print(temperatures_ind.loc[["Moscow", "Saint Petersburg"]])

## Get the worldwide mean temp by year

mean_temp_by_year = temp_by_country_city_vs_year.mean(axis="index")

## Filter for the year that had the highest mean temp

print(mean_temp_by_year[mean_temp_by_year==mean_temp_by_year.max()])

## Get the mean temp by city

mean_temp_by_city = temp_by_country_city_vs_year.mean(axis="columns")

## Filter for the city that had the lowest mean temp

print(mean_temp_by_city[mean_temp_by_city==mean_temp_by_city.min()])

Two ways to summary the dataset ***1- .groupby()[].agg([]) ***2- subset in a variable data[data[""]==""] then, send the subset[""] as an argument to the np.mean (subset["column"]) "column" contains the numbers we want to summarize

• AI Chat
• Code