Beta
What info is in the file? pandas dictionary:
- .shape() = how big is the data
- .info() = what different variables and types are present
- .tail() = see the tail end of the data
- .head() = see the columns and initial values
- .describe()= summary stats on numerical values
- df.[Catergorical].describe() = Catergorical= titanic_train.dtypes[titanic_train.dtypes=='object'].index
- list(df.columns)= list columns
- df.dtypes = type of data in integer, float, object, etc
import pandas as pd
df= pd.read_excel("videogamesales.xlsx")
print(df.columns)
df
df_info=df.info()
print(df_info)
print(df.describe()['Rank'])
#calling for two columns of interest
grouped_df =df.groupby(['Publisher']).max()
print(grouped_df['Genre'])
# interested in specific values
print(df[df['Year'] >2008])
# calculating median, mean, mode
import pandas as pd
import numpy as np
df2= pd.read_excel('videogamesales.xlsx')
df2_median= np.nanmedian(df2['EU_Sales'])
print(df2_median)
df3=pd.read_excel('videogamesales.xlsx')
df3_mean=np.nanmean(df3['EU_Sales'])
print(df3_mean)
Reshape pandas DataFrame using pivot and melt
import pandas as pd
df= pd.read_excel("pivot_data_1.xlsx")
print(df)
df1=df.pivot(index="date", columns="name", values="sales")
print(df1)
import pandas as pd
df2=pd.read_excel("melt_data_1.xlsx")
print(df2)
df3=df2.melt(id_vars='name')
print(df3)
print(df3)
df4=df3.drop('value', axis =1)
print(df4)
SQL Tutorial - JOINS
Unknown integration
DataFrameavailable as
df5
variable
SELECT * FROM all_weeks_countries;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
SELECT * FROM all_weeks_global
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.