Introduction to the Tidyverse Interactive Notes
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Introduction to the Tidyverse Interactive Notes

    Review and practice the concepts and skills you learned in DataCamp's Introduction to the Tidyverse course! This is an interactive notebook powered by DataCamp Workspace.

    Note: Some later examples depend on code in earlier examples. To ensure variables and imports are available to you, click "Run All" in the top of this workspace.

    Chapter 1: Data wrangling

    1.1 Loading libraries and data

    Most data analyses in R use data frames. Data frames hold rectangular data in rows and columns, similar to a spreadsheet.

    The code below imports dplyr and then reads in the gapminder data as a data frame.

    # Load the dplyr package
    library(dplyr)
    
    # Read in the data as a data frame
    gapminder <- read.csv("gapminder.csv")
    
    # Look at the gapminder dataset
    gapminder

    1.2 Filtering and arranging data

    You can use the filter() verb to retrieve a subset of your observations. Inside filter(), pass the condition that you want to filter for.

    Note that prior to a verb like filter(), you will need to use %>% to feed the data into the next step.

    # Filter the gapminder data frame for rows where the country is Canada
    gapminder %>% 
    	filter(country == "Canada")
    
    # Filter the gapminder data frame for rows where life expectancy is under 50
    gapminder %>% 
    	filter(lifeExp < 50)

    You can also pass multiple conditions to filter() for a narrower subset.

    # Filter the gapminder data frame for the country Canada and years greater than 2000
    gapminder %>% 
    	filter(country == "Canada", year > 2000)

    You can sort your data by a column in ascending order using arrange().

    # Sort the gapminder data frame by the year column in ascending order
    gapminder %>%
    	arrange(year)

    To sort in descending order, use desc() inside of arrange().

    # Sort the gapminder data frame by the year column in descending order
    gapminder %>%
    	arrange(desc(year))

    1.3 Creating and adapting new columns

    Use mutate() to change an existing variable, or create a new one entirely. For example, you may want to change the scale of a variable, or multiply two columns together.

    # Create a new column lifeExpMonths based on the lifeExp column
    gapminder %>%
    	mutate(lifeExpMonths = lifeExp * 12)

    Chapter 2: Data visualization

    2.1 Scatter plots

    There are three basic steps to creating a visualization with ggplot2:

    • First, the data that you want to plot.
    • Next is the plot's aesthetics, which you specify with aes(). Here, we pass in the values on the x and y axes.
    • Finally, you add a layer to specify the type of plot. geom_point() creates a scatter plot.
    # Load the ggplot2 library
    library(ggplot2)
    
    # Filter for 1952 and save the result as gapminder_1952
    gapminder_1952 <- gapminder %>%
    	filter(year == 1952)
    
    # Create a scatter plot with pop on the x-axis and gdpPercap on the y-axis
    ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
    	geom_point()