Workspace

Netflix Top 10 Charts (An Independent Review)

0
Beta
Spinner

📖 Background

The Netflix Top 10 charts represent the most popular movies and TV series, with millions of viewers around the globe. Understanding what makes the biggest hits is crucial to making more hits.

💪 Challenge

Explore the dataset to understand the most common attributes of popular Netflix content. Your published notebook should contain a short report on the popular content, including summary statistics, visualizations, statistical models, and text describing any insights you found.

💾 The data

There are three datasets taken from Netflix Top 10.

Each dataset is stored as a table in a PostgreSQL database.

  • all_weeks_global: This contains the weekly top 10 list for movies (films) and TV series at a global level.
  • all_weeks_countries: This contains the weekly top 10 list for movies (films) and TV series by country.
  • most_popular: All-time most popular content by number of hours viewed in the first 28 days from launch.

The data source page describes the methodology for data collection in detail. In particular:

  • Content is categorized as Film (English), TV (English), Film (Non-English), and TV (Non-English).
  • Each season of a TV series is considered separately.
    • Popularity is measured as the total number of hours that Netflix members around the world watched each title from Monday to Sunday of the previous week.
  • Weekly reporting is rounded to the nearest 10 000 viewers.

Database integration

To access the data, use the sample integration named "Competition Netflix Top 10".

Top Weekly Global Movies on Netflix

Unknown integration
DataFrameavailable as
df
variable
SELECT *
	FROM all_weeks_global
   
    
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
min_week = min (world['week'])
max_week = max(world['week'])

min_week, max_week, max_week-min_week

len(world.week.value_counts())

This data was derived for over a period of 75 weeks (518 days) from 2021-07-04 to 2022-12-04

Most watched category worldwide in descending order

world.groupby(['category'])['weekly_hours_viewed'].mean().round().sort_values(ascending = False).reset_index()
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure (figsize = (10,8))
sns.lineplot(x='week', y = 'weekly_hours_viewed', hue = 'category',  data = world)
plt.ylabel('Weekly hours viewed (per 100 million)')
plt.xlabel ('Week')
plt.xticks (rotation = 60)
sns.despine()

From the table and graph depicted above, the top most viewed categories in descending order are

  1. TV (English)
  2. TV (Non-English)
  3. Films (English)
  4. Films (Non-English)

Top 10 TV shows with the most weekly hours viewed worldwide

top_TV_shows = world.groupby(['season_title', 'category'])['weekly_hours_viewed'].mean().round().sort_values(ascending = False)[:10]
top_TV_shows = top_TV_shows.reset_index()
top_TV_shows
# I have decided to have fun and go with netflix primary colors
sns.barplot(y = 'season_title', x = 'weekly_hours_viewed', color = '#E50914',edgecolor = 'black', data = top_TV_shows)
plt.title('Top 10 TV Shows Worldwide (2021-07-04 to 2022-12-04)')
plt.xlabel('Weekly hours viewed \n (Per 100,000 million hours views)', labelpad = 10)
plt.ylabel('')
sns.despine()

The above findings so far are quite interesting: Nine out of the top 10 shows were English TV shows while only one (Squid Game) was a non-English TV show.

It appears that globally, mankind prefers watching English TV shows.

Another remarkable discovery is that these findings appear to be at variance with the spike noted in the lineplot from 2021-09-26 to 2021-10-17 made by a non-English TV show - Squid Game: Season 1. The spike on face value portends that Squid Game should be the most watched but that was not the case. Quite puzzling

Top 10 Film shows watched globally