strutting python files
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    What info is in the file? pandas dictionary:

    • .shape() = how big is the data
    • .info() = what different variables and types are present
    • .tail() = see the tail end of the data
    • .head() = see the columns and initial values
    • .describe()= summary stats on numerical values
    • df.[Catergorical].describe() = Catergorical= titanic_train.dtypes[titanic_train.dtypes=='object'].index
    • list(df.columns)= list columns
    • df.dtypes = type of data in integer, float, object, etc
    import pandas as pd
    
    
    df= pd.read_excel("videogamesales.xlsx")
    
    print(df.columns)
    
    
    df
    df_info=df.info()
    print(df_info)
    
    
    print(df.describe()['Rank'])
    #calling for two columns of interest
    
    grouped_df =df.groupby(['Publisher']).max()
    print(grouped_df['Genre'])
    # interested in specific values 
    
    print(df[df['Year'] >2008])
    
    
    # calculating median, mean, mode
    import pandas as pd
    import numpy as np
    df2= pd.read_excel('videogamesales.xlsx')
    
    
    df2_median= np.nanmedian(df2['EU_Sales'])
    print(df2_median)
    df3=pd.read_excel('videogamesales.xlsx')
    df3_mean=np.nanmean(df3['EU_Sales'])
    
    print(df3_mean)

    Reshape pandas DataFrame using pivot and melt

    https://www.youtube.com/watch?v=uCx0soWPj9E

    import pandas as pd
    df= pd.read_excel("pivot_data_1.xlsx")
    print(df)
    
    
    df1=df.pivot(index="date", columns="name", values="sales")
    print(df1)
    import pandas as pd
    df2=pd.read_excel("melt_data_1.xlsx")
    print(df2)
    
    df3=df2.melt(id_vars='name')
    print(df3)
    
    print(df3)
    
    df4=df3.drop('value', axis =1)
    print(df4)
    Unknown integration
    DataFrameavailable as
    df5
    variable
    SELECT * FROM all_weeks_countries;
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT * FROM all_weeks_global
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.