Workspace
Laura-Valentina Costache/

Analyzing Streaming Service Content in SQL - Codealong

0
Beta
Spinner

Analyzing Streaming Service Content in SQL

Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

To set up your integration, create a PostgreSQL integration with the following credentials:

  • Integration Name: Streaming Codealong
  • Hostname: workspacedemodb.datacamp.com
  • Database: streaming
  • Username: streaming_codealong
  • Password: streaming_codealong

Exploring our data

Let's start by checking out the data we will be working with. We can start with the amazon, hulu, netflix, and disney tables.

Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM amazon
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Current Type: Bar
Current X-axis: None
Current Y-axis: None
Current Color: None
Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM hulu
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM netflix
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM disney
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

We can also inspect the genres table, which is different from the other tables.

Unknown integration
DataFrameavailable as
df
variable
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Preparing our data

Joining the different tables

Our data appears to mostly have the same column names. So we can join the data with a series of UNIONs, which will append each table to the previous one.

We use UNION ALL to preserve any possible duplicate rows, as we will want to count entries if they appear in multiple services.

Unknown integration
DataFrameavailable as
df
variable
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

One problem with the above approach is that we lose out on the streaming service information. So let's repeat our query, but add in the required info!

Unknown integration
DataFrameavailable as
df
variable
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Great! But we have one more table that might prove useful. Let's add in the genre information with a join.

To do this, we will need to use a Common Table Expression, or CTE.

Unknown integration
DataFrameavailable as
df
variable
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.



  • AI Chat
  • Code