Table of Contents
The outline of your notebook will show up here. You can include headings in any text cell by starting a line with #
, ##
, ###
, etc., depending on the desired title hierarchy.
Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this notebook, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
(Excel files).
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
datasets/airbnb_price.csv
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
datasets/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
datasets/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Load the necessary packages
suppressMessages(library(dplyr)) # Do not change this line, as it is required to check your answer correctly
options(readr.show_types = FALSE) # Do not change this line, as it is required to check your answer correctly
library(readr)
library(readxl)
library(stringr)
# Import CSV for prices
airbnb_price <- read_csv('data/airbnb_price.csv', show_col_types=FALSE)
# Import TSV for room types
airbnb_room_type <- read_excel('data/airbnb_room_type.xlsx')
# Import Excel file for review dates
airbnb_last_review <- read_tsv('data/airbnb_last_review.tsv', show_col_types=FALSE)
# Join the three data frames together into one
listings <- airbnb_price %>%
inner_join(airbnb_room_type, by = "listing_id") %>%
inner_join(airbnb_last_review, by = "listing_id")
# Question 1: What is the average listing price?
# To convert price to numeric, remove "dollars" from each value
avg_price <- listings %>%
mutate(price_clean = str_remove(price, " dollars") %>%
as.numeric()) %>%
# Take the mean of price_clean
summarize(avg_price = mean(price_clean)) %>%
# Convert from a tibble to a single number
as.numeric()
# Question 2: How many of the listings are private rooms?
# Since there are differences in capitalization, make capitalization consistent
private_room_count <- listings %>%
mutate(room_type = str_to_lower(room_type)) %>%
# Then count the number of each room_type
count(room_type) %>%
# Get row containing count for private rooms only
filter(room_type == "private room")
# # Extract number of rooms
nb_private_rooms <- private_room_count$n
# # Question 3: Which listing was most recently reviewed?
# In order to use a function like max()/min() on the last_review column, it needs to be converted to Date
review_dates <- listings %>%
# Convert to date using the format 'Month DD YYYY'
mutate(last_review_date = as.Date(last_review, format = "%B %d %Y")) %>%
# Use max() and min() to take the latest and earliest dates
summarize(first_reviewed = min(last_review_date),
last_reviewed = max(last_review_date))
review_dates$nb_private_rooms = nb_private_rooms
review_dates$avg_price = avg_price
review_dates