Netflix Top 10

Pete Sime

‌
‌
‌
‌
‌
‌
‌

Netflix Top 10

Beta

Netflix Top 10: Analyzing Weekly Chart-Toppers

This dataset comprises Netflix's weekly top 10 lists for the most-watched TV shows and films worldwide. The data spans from June 28, 2021, to August 27, 2023.

This workspace is pre-loaded with two CSV files.

netflix_top10.csv contains columns such as show_title, category, weekly_rank, and several view metrics.
netflix_top10_country.csv has information about a show or film's performance by country, contained in the columns cumulative_weeks_in_top_10 and weekly_rank.

We've added some guiding questions for analyzing this exciting dataset! Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.

Source: Netflix

import pandas as pd

Read the netflix_top10.csv file

df = pd.read_csv('netflix_top10.csv')

Combine the different categories into a single weekly top 10 list

weekly_top10 = df.groupby('weekly_rank').apply(lambda x: x['show_title'].tolist()).reset_index(name='combined_top10')

Display the combined top 10 list

weekly_top10

🔍 Scenario: Understanding the Impact of Content Duration on Netflix's Top 10 Lists

This scenario helps you develop an end-to-end project for your portfolio.

Background: As a data scientist at Netflix, you're tasked with exploring the dataset containing weekly top 10 lists of the most-watched TV shows and films. For example, you're tasked to find out what the relationship is between duration and ranking over time. Answering this question can inform content creators and strategists on how to optimize their offerings for the platform.

Objective: Determine if there's a correlation between content duration and its likelihood of making it to the top 10 lists.

You can query the pre-loaded CSV files using SQL directly. Here’s a sample query:

Unknown integration

DataFrameavailable as

df

variable

SELECT show_title, MAX(cumulative_weeks_in_top_10) as max_cumulative_weeks_in_top_10
FROM 'netflix_top10_country.csv'
WHERE country_name = 'Argentina'
GROUP BY show_title
ORDER BY max_cumulative_weeks_in_top_10 DESC
LIMIT 3;

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

import pandas as pd

global_top_10 = pd.read_csv("netflix_top10.csv", index_col=0)
global_top_10.head()

countries_top_10 = pd.read_csv("netflix_top10_country.csv", index_col=0)
countries_top_10.head()

Ready to share your work?