Introduction to Statistics in Python
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Introduction to Statistics in Python

    Run the hidden code cell below to import the data used in this course.

    # Importing numpy and pandas
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Importing the course datasets
    deals = pd.read_csv("datasets/amir_deals.csv")
    happiness = pd.read_csv("datasets/world_happiness.csv")
    food = pd.read_csv("datasets/food_consumption.csv")

    Take Notes

    Definition Statistics

    Type of statistics

    Countinuous variable

    Discreate variable

    Types of data

    Measures of Center including mean, median and mode

    Measures of speard

    finding outliers

    Distribution

    What is statistics mean ?

    • Any quantity computed from values
    • Statistics the praactice, study of collecting and analyzing data.
    • In another words statistics How to use mathematical to analyzing the data. _Type of statistics. _There two type of statistics .
    1. Descriptive statistics
    2. Inferential statistics

    What is descriptive statistics?

    • In the sample definition
    • Is to describe the data by using summary statistic like average median or charts or graphs and so on.
    • Example if we want to answer this question What is the average grade point for students for the first semester?

    What is inferential statistics ?

    • Use sample data to make inference about target population.
    • This type of statistic belongs to distribution like benomail normal distribution.

    What is type of data?

    • When we say type of data we mean types data in statistic not in general data.
    • Have two types Numerical and Categorical data:
    • NUMERICAL data (Quantitative) Example salary
    • CATEGORICAL data (qualitative) example amounts of products

    Categorical data:

    • Can be represent as numbers or take numbers values.
    • We can subset categorical data to:
    1. Nominal data : Sometimes called labeled its unordered data.
    • Examples names, sex, ayes colours 2.Ordinal data: Strongly disagree Neither agree nor disagree and so on

    What is continuous variables or continuous data?

    Continuous data is data can be measured between an interval such as time ,temperature , sales per year

    What is discreate values or data?

    It's counted limited such as students in the class or the cars in jaraj

    # display the data 
    deals
    # statistics numeric
    deals.describe(include='all')
    # look at distribution  of  the data use histogram
    sns.histplot(data=deals.dropna(), x="amount")
    plt.show()

    Measures of center

    # Calculate mean and median of amount group by product
    deals_mean_median = deals.groupby("product")["amount"].agg([np.mean, np.median])
    deals_mean_median
    # Use pandas plot to display mean and median
    deals_mean_median.hist()
    plt.show()
    # load dataset from csv file
    food = pd.read_csv("datasets/food_consumption.csv")
    food
    # Filter Argentina
    arg_consumption = food[food["country"]=="Argentina"]
    # Filter Albania
    alb_consumption = food[food["country"]=="Albania"]
    # Calculate mean and median for Argentina
    print(np.mean(arg_consumption["consumption"]), 'Average of Argentina consumption')
    print(np.median(arg_consumption["consumption"]),  ' median of Argentina consumption   ')
    
    # Calculate mean and median for Albania
    print(np.mean(alb_consumption["consumption"]), 'Average consumption of Albania')
    print(np.median(alb_consumption["consumption"]), 'Median consumption of Albania')
    # subset for Argentina and albania
    arg_and_alb = food[(food["country"]=="Argentina")|(food["country"]=="Albania")]
    # Calculate mean and median for Argentina, Albania group by country
    print(arg_and_alb.groupby("country")["consumption"].agg([np.mean, np.median]))
    # Filter type of food  kind Wheat
    wheat_consumption = food[food["food_category"]=="wheat"]
    # Histogram emission of carbon from wheat
    wheat_consumption["co2_emission"].hist()
    plt.title('percentage of carbon emission from wheat', color='r')
    plt.show()