Course notes: Intermediate postgreSQL
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    In the SQL cell below:

    • Set the source to 'Course Databases' to query databases used in SQL courses.
    • Set the source to 'DataFrames and CSVs' to query this course's CSV files (if any) with SQL
    • Write SQL! Note that you cannot run queries that modify the database.

    The content

    • Explore your data use SELECT.
    • Summary statistic.
    • Filter summery data with WHERE
    • HAVING with GROIP BY.
    • Finding and resolving missing data..
    • Subseting missing values use COLEDSCE.
    • CASE WHEN statment.
    • Math with dates.
    • Rounding and Truncating numbers.
    • Common Table Expression.
    • Windows functions

    Here we will content with this much and we will not explain the statistical functions such as STDEV() AND Mode maybe next.

    Here we use films table from database cinema

    Explor your data use SELECT column from specify table

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Select all fields from table
    SELECT *
    FROM cinema.films;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df1
    variable
    -- Specify some column from table
    SELECT id, country, duration ,language 
    FROM cinema.films
    -- detriment numbers of rows lilit 5
    LIMIT 5;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Summery statistics include MAX(), MIN(), SUM(), AVG() and COUNT() functions

    Why did we use summery statistics in SQL

    Summary statistics are used in SQL Server to provide a statistical summary of the entire table or view. They can be used to get a quick glance at the shape and distribution of the data before taking a deep dive³. The postgreSQL query optimizer uses distribution statistics when creating a query plan. The statistics provide information about the distribution of column values across participating rows, helping the optimizer better estimate the number of rows, or cardinality, of the query results⁴.

    If you don't use groupby with aggregate function you must take an error

    Unknown integration
    DataFrameavailable as
    df2
    variable
    -- count the total numbers of gross by country and duration 
    SELECT country, duration,
    SUM(gross) AS total_gross
    FROM cinema.films
    -- Grouped by country and duration
    GROUP BY country, duration
    -- order by country
    ORDER BY country;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df3
    variable
    -- Selrct min, max and avg for gross
    SELECT
    MIN(gross) AS mini_gross,
    MAX(gross) AS maxi_gross,
    AVG(gross) AS main_gross
    FROM cinema.films;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df4
    variable
    -- Selrct min, max and avg for gross
    SELECT
    MIN(gross) AS mini_gross,
    MAX(gross) AS maxi_gross,
    AVG(gross) AS main_gross
    FROM cinema.films
    -- Filter by country 'Argentina'
    WHERE country = 'Argentina';
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Cant did use WHERE with GROUP BY it's give you an error also you can use HAVING with GROUP BY

    The reason why we need to use GROUP BY with HAVING but not with WHERE is because the WHERE statement is evaluated before any aggregations take place.

    The alternate HAVING is placed after the GROUP BY and allows you to filter the returned data by an aggregated column. Using HAVING, you can return the aggregate filtered results!

    Here we use payment table from dvdrentals database

    Unknown integration
    DataFrameavailable as
    df6
    variable
    -- Explore the table
    SELECT * FROM dvdrentals.payment;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df5
    variable
    -- Select payment
    SELECT payment_id, 
    -- Calculate sum alias by total amount
    SUM(amount) AS total_amount,
    -- calculate average of amount
    AVG(amount) AS average_amount
    FROM dvdrentals.payment
    -- Grouped by payment id
    GROUP BY payment_id
    -- Filter by aggregate column
    HAVING SUM(amount) < 8.99;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.