Beta
In the SQL cell below:
- Set the source to 'Course Databases' to query databases used in SQL courses.
- Set the source to 'DataFrames and CSVs' to query this course's CSV files (if any) with SQL
- Write SQL! Note that you cannot run queries that modify the database.
The content
- Explore your data use SELECT.
- Summary statistic.
- Filter summery data with WHERE
- HAVING with GROIP BY.
- Finding and resolving missing data..
- Subseting missing values use COLEDSCE.
- CASE WHEN statment.
- Math with dates.
- Rounding and Truncating numbers.
- Common Table Expression.
- Windows functions
Here we will content with this much and we will not explain the statistical functions such as STDEV() AND Mode maybe next.
Here we use films table from database cinema
Explor your data use SELECT column from specify table
Unknown integration
DataFrameavailable as
df
variable
-- Select all fields from table
SELECT *
FROM cinema.films;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df1
variable
-- Specify some column from table
SELECT id, country, duration ,language
FROM cinema.films
-- detriment numbers of rows lilit 5
LIMIT 5;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Summery statistics include MAX(), MIN(), SUM(), AVG() and COUNT() functions
Why did we use summery statistics in SQL
Summary statistics are used in SQL Server to provide a statistical summary of the entire table or view. They can be used to get a quick glance at the shape and distribution of the data before taking a deep dive³. The postgreSQL query optimizer uses distribution statistics when creating a query plan. The statistics provide information about the distribution of column values across participating rows, helping the optimizer better estimate the number of rows, or cardinality, of the query results⁴.
If you don't use groupby with aggregate function you must take an error
Unknown integration
DataFrameavailable as
df2
variable
-- count the total numbers of gross by country and duration
SELECT country, duration,
SUM(gross) AS total_gross
FROM cinema.films
-- Grouped by country and duration
GROUP BY country, duration
-- order by country
ORDER BY country;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df3
variable
-- Selrct min, max and avg for gross
SELECT
MIN(gross) AS mini_gross,
MAX(gross) AS maxi_gross,
AVG(gross) AS main_gross
FROM cinema.films;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df4
variable
-- Selrct min, max and avg for gross
SELECT
MIN(gross) AS mini_gross,
MAX(gross) AS maxi_gross,
AVG(gross) AS main_gross
FROM cinema.films
-- Filter by country 'Argentina'
WHERE country = 'Argentina';
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Cant did use WHERE with GROUP BY it's give you an error also you can use HAVING with GROUP BY
The reason why we need to use GROUP BY with HAVING but not with WHERE is because the WHERE statement is evaluated before any aggregations take place.
The alternate HAVING is placed after the GROUP BY and allows you to filter the returned data by an aggregated column. Using HAVING, you can return the aggregate filtered results!
Here we use payment table from dvdrentals database
Unknown integration
DataFrameavailable as
df6
variable
-- Explore the table
SELECT * FROM dvdrentals.payment;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df5
variable
-- Select payment
SELECT payment_id,
-- Calculate sum alias by total amount
SUM(amount) AS total_amount,
-- calculate average of amount
AVG(amount) AS average_amount
FROM dvdrentals.payment
-- Grouped by payment id
GROUP BY payment_id
-- Filter by aggregate column
HAVING SUM(amount) < 8.99;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.