Introduction to the Tidyverse Interactive Notes
Review and practice the concepts and skills you learned in DataCamp's Introduction to the Tidyverse course! This is an interactive notebook powered by DataCamp Workspace.
Note: Some later examples depend on code in earlier examples. To ensure variables and imports are available to you, click "Run All" in the top of this workspace.
Chapter 1: Data wrangling
1.1 Loading libraries and data
Most data analyses in R use data frames. Data frames hold rectangular data in rows and columns, similar to a spreadsheet.
The code below imports dplyr
and then reads in the gapminder data as a data frame.
# Load the dplyr package
library(dplyr)
# Read in the data as a data frame
gapminder <- read.csv("gapminder.csv")
# Look at the gapminder dataset
gapminder
1.2 Filtering and arranging data
You can use the filter()
verb to retrieve a subset of your observations. Inside filter()
, pass the condition that you want to filter for.
Note that prior to a verb like filter()
, you will need to use %>%
to feed the data into the next step.
# Filter the gapminder data frame for rows where the country is Canada
gapminder %>%
filter(country == "Canada")
# Filter the gapminder data frame for rows where life expectancy is under 50
gapminder %>%
filter(lifeExp < 50)
You can also pass multiple conditions to filter()
for a narrower subset.
# Filter the gapminder data frame for the country Canada and years greater than 2000
gapminder %>%
filter(country == "Canada", year > 2000)
You can sort your data by a column in ascending order using arrange()
.
# Sort the gapminder data frame by the year column in ascending order
gapminder %>%
arrange(year)
To sort in descending order, use desc()
inside of arrange()
.
# Sort the gapminder data frame by the year column in descending order
gapminder %>%
arrange(desc(year))
1.3 Creating and adapting new columns
Use mutate()
to change an existing variable, or create a new one entirely. For example, you may want to change the scale of a variable, or multiply two columns together.
# Create a new column lifeExpMonths based on the lifeExp column
gapminder %>%
mutate(lifeExpMonths = lifeExp * 12)
Chapter 2: Data visualization
2.1 Scatter plots
There are three basic steps to creating a visualization with ggplot2
:
- First, the data that you want to plot.
- Next is the plot's aesthetics, which you specify with
aes()
. Here, we pass in the values on the x and y axes. - Finally, you add a layer to specify the type of plot.
geom_point()
creates a scatter plot.
# Load the ggplot2 library
library(ggplot2)
# Filter for 1952 and save the result as gapminder_1952
gapminder_1952 <- gapminder %>%
filter(year == 1952)
# Create a scatter plot with pop on the x-axis and gdpPercap on the y-axis
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
geom_point()