Beta
Introduction to Statistics in Python
Run the hidden code cell below to import the data used in this course.
# Importing numpy and pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")
Take Notes
Definition Statistics
Type of statistics
Countinuous variable
Discreate variable
Types of data
Measures of Center including mean, median and mode
Measures of speard
finding outliers
Distribution
What is statistics mean ?
- Any quantity computed from values
- Statistics the praactice, study of collecting and analyzing data.
- In another words statistics How to use mathematical to analyzing the data. _Type of statistics. _There two type of statistics .
- Descriptive statistics
- Inferential statistics
What is descriptive statistics?
- In the sample definition
- Is to describe the data by using summary statistic like average median or charts or graphs and so on.
- Example if we want to answer this question What is the average grade point for students for the first semester?
What is inferential statistics ?
- Use sample data to make inference about target population.
- This type of statistic belongs to distribution like benomail normal distribution.
What is type of data?
- When we say type of data we mean types data in statistic not in general data.
- Have two types Numerical and Categorical data:
- NUMERICAL data (Quantitative) Example salary
- CATEGORICAL data (qualitative) example amounts of products
Categorical data:
- Can be represent as numbers or take numbers values.
- We can subset categorical data to:
- Nominal data : Sometimes called labeled its unordered data.
- Examples names, sex, ayes colours 2.Ordinal data: Strongly disagree Neither agree nor disagree and so on
What is continuous variables or continuous data?
Continuous data is data can be measured between an interval such as time ,temperature , sales per year
What is discreate values or data?
It's counted limited such as students in the class or the cars in jaraj
# display the data
deals
# statistics numeric
deals.describe(include='all')
# look at distribution of the data use histogram
sns.histplot(data=deals.dropna(), x="amount")
plt.show()
Measures of center
# Calculate mean and median of amount group by product
deals_mean_median = deals.groupby("product")["amount"].agg([np.mean, np.median])
deals_mean_median
# Use pandas plot to display mean and median
deals_mean_median.hist()
plt.show()
# load dataset from csv file
food = pd.read_csv("datasets/food_consumption.csv")
food
# Filter Argentina
arg_consumption = food[food["country"]=="Argentina"]
# Filter Albania
alb_consumption = food[food["country"]=="Albania"]
# Calculate mean and median for Argentina
print(np.mean(arg_consumption["consumption"]), 'Average of Argentina consumption')
print(np.median(arg_consumption["consumption"]), ' median of Argentina consumption ')
# Calculate mean and median for Albania
print(np.mean(alb_consumption["consumption"]), 'Average consumption of Albania')
print(np.median(alb_consumption["consumption"]), 'Median consumption of Albania')
# subset for Argentina and albania
arg_and_alb = food[(food["country"]=="Argentina")|(food["country"]=="Albania")]
# Calculate mean and median for Argentina, Albania group by country
print(arg_and_alb.groupby("country")["consumption"].agg([np.mean, np.median]))
# Filter type of food kind Wheat
wheat_consumption = food[food["food_category"]=="wheat"]
# Histogram emission of carbon from wheat
wheat_consumption["co2_emission"].hist()
plt.title('percentage of carbon emission from wheat', color='r')
plt.show()