## Introduction to the Tidyverse

Run the hidden code cell below to import the data used in this course.

```
# Load the Tidyverse
library(tidyverse)
# Read in the gapminder file
gapminder <- read.csv("datasets/gapminder.csv", sep = "\t")
```

### Adding a percentage column

n() gives the current group size.

mutate(percent = n * 100 / sum(n))

```
education_counts_strat <- attrition_strat %>%
count(Education, sort = TRUE) %>%
mutate(percent = n * 100 / sum(n))code snippets here
```

Normal distribution has a standard deviation of 68%, 95% and 99.7% In hypothesis testing our data needs to follow a normal distribution in order to complete statistical tests. Eg: comparing the mean of a sample to the population it represents.

The Central Limit Theorem typically requires a sample size of at least 30.

Lambda represents the average number of events per time interval (used with the Poisson distribution).

Hypotheseis testing is used to compare populations. Asssume no difference exists between populations(null hypothesis). x axis = independent variable(unaffected by other data), y axis = dependent variable(affected by other data).