Skip to content
Course Notes: Introduction to Importing Data in R
  • AI Chat
  • Code
  • Report
  • Spinner

    Course Notes

    Use this workspace to take notes, store code snippets, and build your own interactive cheatsheet!

    # Import any packages you want to use here
    library(readr)
    library(data.table)
    

    Take Notes

    Add notes here about the concepts you've learned and code cells with code you want to keep.

    Add your notes here

    # Import swimming_pools.csv: pools
    pools <- read.csv("swimming_pools.csv")
    
    # Print the structure of pools
    str(pools)
    # Import swimming_pools.csv correctly: pools
    pools <- read.csv("swimming_pools.csv", stringsAsFactors = FALSE)
    
    # Check the structure of pools
    str(pools)

    read.delim

    Aside from .csv files, there are also the .txt files which are basically text files. You can import these functions with read.delim(). By default, it sets the sep argument to "\t" (fields in a record are delimited by tabs) and the header argument to TRUE (the first row contains the field names).

    # Import hotdogs.txt: hotdogs
    hotdogs <- read.delim("hotdogs.txt", header=FALSE)
    
    # Summarize hotdogs
    summary(hotdogs)

    read.table

    If you're dealing with more exotic flat file formats, you'll want to use read.table(). It's the most basic importing function; you can specify tons of different arguments in this function. Unlike read.csv() and read.delim(), the header argument defaults to FALSE and the sep argument is "" by default.

    # Path to the hotdogs.txt file: path
    path <- file.path("data", "hotdogs.txt")
    
    # Import the hotdogs.txt file: hotdogs
    hotdogs <- read.table(path, 
                          sep = "\t", 
                          col.names = c("type", "calories", "sodium"))
    
    # Call head() on hotdogs
    head(hotdogs)
    # Finish the read.delim() call
    hotdogs <- read.delim("hotdogs.txt", header = FALSE, col.names = c("type", "calories", "sodium"))
    
    # Select the hot dog with the least calories: lily
    lily <- hotdogs[which.min(hotdogs$calories), ]
    
    # Select the observation with the most sodium: tom
    tom <- hotdogs[which.max(hotdogs$sodium), ]
    
    # Print lily and tom
    print(lily)
    print(tom)

    Column classes

    Next to column names, you can also specify the column types or column classes of the resulting data frame. You can do this by setting the colClasses argument to a vector of strings representing classes:

    read.delim("my_file.txt", colClasses = c("character", "numeric", "logical")) This approach can be useful if you have some columns that should be factors and others that should be characters. You don't have to bother with stringsAsFactors anymore; just state for each column what the class should be.

    If a column is set to "NULL" in the colClasses vector, this column will be skipped and will not be loaded into the data frame.

    # Previous call to import hotdogs.txt
    hotdogs <- read.delim("hotdogs.txt", header = FALSE, col.names = c("type", "calories", "sodium"))
    
    # Display structure of hotdogs
    str(hotdogs)
    
    # Edit the colClasses argument to import the data correctly: hotdogs2
    hotdogs2 <- read.delim("hotdogs.txt", header = FALSE, 
                           col.names = c("type", "calories", "sodium"),
                           colClasses = c("factor", "NULL", "numeric"))
    
    
    # Display structure of hotdogs2
    str(hotdogs2)
    # Add your code snippets here
    
    # Import potatoes.csv with read_csv(): potatoes
    potatoes <- read_csv("potatoes.csv")
    
    # Column names
    properties <- c("area", "temp", "size", "storage", "method",
                    "texture", "flavor", "moistness")
    
    # Import potatoes.txt: potatoes
    potatoes <- read_tsv("potatoes.txt", col_names=properties)
    
    # Call head() on potatoes
    head(potatoes)