SQL for Absolute Beginners
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Live Code-Along: SQL for Absolute Beginners

    Key session takeaways

    • Learn the basics of SQL & why it's foundational for data science work.
    • Filter, group by, and analyse datasets using SQL.
    • Answer key business questions with SQL

    The Dataset

    The dataset to be used in this training is a CSV file named airbnb_data.csv, which contains data on airbnb listings in the state of New York. It contains the following columns:

    • listing_id: The unique identifier for a listing
    • description: The description used on the listing
    • host_id: Unique identifier for a host
    • neighbourhood_full: Name of boroughs and neighbourhoods
    • coordinates: Coordinates of listing (latitude, longitude)
    • listing_added: Date of added listing
    • room_type: Type of room
    • rating: Rating from 0 to 5.
    • price: Price per night for listing
    • number_of_reviews: Amount of reviews received
    • reviews_per_month: Number of reviews per month
    • availability_365: Number of days available per year
    • number_of_stays: Total number of stays thus far

    Questions to answer

    • Question 1: List the top 10 most reviewed private rooms
    • Question 2: What are the cheapest 10 private rooms in New York?
    • Question 3: What is the average availability of a private room in New York?

    Q&A

    • Question 4: Which listings have an availability of fewer than 30 days a year but have fewer than 10 reviews?
    • Question 5: What is the average number of reviews per room type, ordered by the average in descending order?
    • Question 6: What is the number and average price of listings by room type where such listings are available for more than 250 days a year?

    Some SQL Basics

    SQL Commands

    Before we dive into answering our questions, let's get familiar with SQL syntax. Today, we'll be covering the most useful SQL commands — let's start with them one by one. This includes the following:

    • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
    • FROM : specifies the table that the data should be returned from.
    • LIMIT: limits the number of rows returned.
    • AS: lets you set an alias for a particular column
    • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
    • WHERE: lets you filter on a specific column or value
    • GROUP BY: lets you aggregate on one or more columns
    • MAX(): returns the maximum amount of a particular column
    • AVG(): returns the total average of a particular column

    Code Commenting

    There are two types of code commenting in Postgres

    -- Inline commenting : Used for quick, short notes

    "/" Multi line commenting "/" (use without quotation marks): Used for longer comments, such as metadata, or code headers including the author, date, purpose, etc.

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- SELECT the first 10 rows of all the columns from the airbnb dataset
    SELECT * 
    FROM 'airbnb_data.csv'
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df1
    variable
    -- SELECT the first 10 rows of the listing_id, description, and neighbourhood_full columns
    SELECT listing_id, description, neighbourhood_full
    FROM 'airbnb_data.csv'
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df2
    variable
    -- Set an alias for listing_id, description and neighbourhood_full
    SELECT listing_id AS 'Listing ID', 
    	   description AS 'Description', 
    	   neighbourhood_full AS 'Location'
    FROM 'airbnb_data.csv'
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df3
    variable
    -- Sort the output by number of stays
    SELECT * 
    FROM 'airbnb_data.csv'
    ORDER BY number_of_stays 
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df4
    variable
    -- Filter room type by private room
    SELECT * 
    FROM 'airbnb_data.csv'
    WHERE room_type == 'Private Room'
    -- WHERE price >= 50
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df5
    variable
    -- Get the average price for all rooms
    SELECT AVG(price) AS 'Average Price'
    FROM 'airbnb_data.csv'
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Get the average price per room type
    SELECT room_type AS 'Room Type', 
           AVG(price) AS 'Average Price'
    FROM 'airbnb_data.csv'
    GROUP BY room_type
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Q&A

    Let's answer some questions!

    Question 1: List the top 10 most reviewed private rooms

    In order to answer this question, we need to filter on room_type and order by reviews_per_month, and limit to the top 10 results. Here are the commands we need to use:

    • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
    • FROM : specifies the table that the data should be returned from.
    • WHERE: specifies a condition on a column
    • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
    • LIMIT: returns the specified number of rows
    Unknown integration
    DataFrameavailable as
    df10
    variable
    -- List the top 10 most reviewed private rooms
    SELECT * 
    FROM 'airbnb_data.csv'
    WHERE room_type == 'Private Room'
    ORDER BY number_of_reviews DESC
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Hidden code

    Question 2: What are the cheapest 10 private rooms in New York?

    In order to answer this question, we need to filter on room_type and order by price, and limit to the top 10 results. Here are the commands we need to use:

    • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
    • FROM : specifies the table that the data should be returned from.
    • WHERE: specifies a condition on a column
    • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
    • LIMIT: returns the specified number of rows
    Unknown integration
    DataFrameavailable as
    df11
    variable
    -- List the top 10 cheapest private rooms in New York
    SELECT *
    FROM 'airbnb_data.csv'
    WHERE room_type == 'Private Room'
    ORDER BY price ASC
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.