Workspace
Kurtis Pykes/

Pandas Read CSV Tutorial: Importing Data

0
Beta
Spinner

Airbnb Listings

This dataset consists of six files with Airbnb rental listings of six cities: Austin, Bangkok, Buenos Aires, Cape Town, Istanbul, and Melbourne. Each row represents a listing with details such as coordinates, neighborhood, host id, price per night, number of reviews, and so on.

Not sure where to begin? Scroll to the bottom to find challenges!

Other cities

The file names for the other cities are listings_austin.csv, listings_bangkok.csv, listings_buenoes_aires.csv, listings_cape_town.csv, and listings_istanbul.csv. If you want data on other locations, visit the source of the dataset, InsideAirbnb, and upload it to your workspace.

Data Dictionary

ColumnExplanation
idAirbnb's unique identifier for the listing
name
host_id
host_name
neighbourhood_groupThe neighbourhood group as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.
neighbourhoodThe neighbourhood as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.
latitudeUses the World Geodetic System (WGS84) projection for latitude and longitude.
longitudeUses the World Geodetic System (WGS84) projection for latitude and longitude.
room_type
pricedaily price in local currency. Note, $ sign may be used despite locale
minimum_nightsminimum number of night stay for the listing (calendar rules may be different)
number_of_reviewsThe number of reviews the listing has
last_reviewThe date of the last/newest review
calculated_host_listings_countThe number of listings the host has in the current scrape, in the city/region geography.
availability_365avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may be available because it has been booked by a guest or blocked by the host.
number_of_reviews_ltmThe number of reviews the listing has (in the last 12 months)
license

The data for each city was compiled by InsideAirbnb between October and November 2021.

Source and license of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • πŸ—ΊοΈ Explore: What is the distribution of prices across a city's neighborhoods? How does it change when you segment it further by room_type?
  • πŸ“Š Visualize: Create a map with a dot for each listing in a city and add a color scale based on price on the dots.
  • πŸ”Ž Analyze: How do listings that require a minimum stay of a week or longer differ from those that don't?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

An international real estate firm has hired you to research professional hosting on Airbnb. These are hosts that have multiple listings, make considerable income from their listings, and often manage teams to operate their listings. Examples include property managers and hospitality business owners.

Using the data from all six cities, you'll have to infer listings by professional hosts based on the distribution of calculated_host_listings_count. The lead consultant is interested in whether you can identify trends across listings operated by inferred professional hosts, as well as an estimation of the percentage of listings on Airbnb operated by professional hosts.

You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

Importing data with read_csv()

import pandas as pd

# Read the CSV file
airbnb_data = pd.read_csv("data/listings_austin.csv")

# View the first 5 rows
airbnb_data.head()

Selecting a column as index

# Setting the id column as the index
airbnb_data = pd.read_csv("data/listings_austin.csv", index_col="id")
# airbnb_data = pd.read_csv("data/listings_austing.csv", index_col=0)

# Preview first 5 rows
airbnb_data.head()

Selecting specific columns to read into memory

# Defining the columns to read 
usecols = ["id", "name", "host_id", "neighbourhood", "room_type", "price", "minimum_nights"]

# Read data with subset of columns
airbnb_data = pd.read_csv("data/listings_austin.csv", index_col="id", usecols=usecols)

# Preview first 5 rows
airbnb_data.head()

Reading data from a URL

# Webpage URL 
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Define the column names
col_names = ["sepal_length_in_cm",
             "sepal_width_in_cm", 
             "petal_length_in_cm", 
             "petal_width_in_cm", 
             "class"]

# Read data from URL
iris_data = pd.read_csv(url, names=col_names)

iris_data.head() 

Methods of the dataframe structure

.head() and .tail()

# See first N
iris_data.head()
# See last N
iris_data.tail()
β€Œ
β€Œ
β€Œ
  • AI Chat
  • Code