Course notes: Exploratory Data Analysis in Python
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Course Notes

    • In This course We Analyze our Data step by step
    1. Read your data in CSV File
    2. Summarize the number of missing values and statistical or numeric data
    3. Use histogram to look at distribution of numeric data
    4. How to celect numeric or categorical data from DataFrame and How to create new columns
    5. Use Seaborn plots to calculate median use(boxplot) to calculate mean use(barplot)
    6. To calculate the relationship between to values use (scatterplot)
    7. Strategies for adderssing missing data
    8. Inputing summaries statistic
    9. Converting and analyzing categorical data
    10. How to clean outliers 11.Data Time Correlations Relative class frequancy (crosstab) Hypothesis
    # Import any packages you want to use here
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns 
    import seaborn as sb
    # Display the dataset
    clean_books = pd.read_csv('datasets/clean_books.csv', encoding='utf-8')
    clean_books
    # Summarize the numbers of missing value in each columns data type and memory usage
    clean_books.info()
    # Numeric data 
    clean_books.describe()
    # Plot the data  use histogram to look at the distribution of numeric data
    sns.histplot(x="rating", data=clean_books)
    plt.show()
    # Look to data type for each columns
    clean_books.dtypes
    # Comparing between value use isin() method
    clean_books['genre'].isin(['Fiction', 'None Fiction'])
    # Count the values
    clean_books.value_counts('genre')
    # Use opretor tilde to denie the column return True is value exist
    ~clean_books['genre'].isin(['Fiction'])
    # Check if year 2020 is exist
    ~clean_books['year'].isin(['2020']).head()
    sns.boxplot(x=clean_books["year"].astype(int), y=clean_books['rating'])
    plt.xticks(rotation=45)
    plt.show()
    # What is median year
    sns.set()
    sns.boxenplot(x=clean_books["year"].astype(int))
    plt.show()
    import numpy as np
    print(np.median(clean_books['year']), 'Median')
    print(np.max(clean_books['year']), 'Max')
    print(np.min(clean_books['year']), 'Min')