Competition - Christmas movie grossings

Beta

Predicting Christmas Movie Grossings

📖 Background

Imagine harnessing the power of data science to unveil the hidden potential of movies before they even hit the silver screen! As a data scientist at a forward-thinking cinema, you're at the forefront of an exhilarating challenge: crafting a cutting-edge system that doesn't just predict movie revenues, but reshapes the entire landscape of cinema profitability. This isn't just about numbers; it's about blending art with analytics to revolutionize how movies are marketed, chosen, and celebrated.

Your mission? To architect a predictive model that dives deep into the essence of a movie - from its title and running time to its genre, captivating description, and star-studded cast. And what better way to sprinkle some festive magic on this project than by focusing on a dataset brimming with Christmas movies? A highly-anticipated Christmas movie is due to launch soon, but the cinema has some doubts. It wants you to predict its success, so it can decide whether to go ahead with the screening or not. It's a unique opportunity to blend the cheer of the holiday season with the rigor of data science, creating insights that could guide the success of tomorrow's blockbusters. Ready to embark on this cinematic adventure?

💾 The data

We're providing you with a dataset of 788 Christmas movies, with the following columns:

christmas_movies.csv

Variable	Description
`title`	the title of the movie
`release_year`	year the movie was released
`description`	short description of the movie
`type`	the type of production e.g. Movie, TV Episode
`rating`	the rating/certificate e.g. PG
`runtime`	the movie runtime in minutes
`imdb_rating`	the IMDB rating
`genre`	list of genres e.g. Comedy, Drama etc.
`director`	the director of the movie
`stars`	list of actors in the movie
`gross`	the domestic gross of the movie in US dollars (what we want to predict)

You may also use an additional dataset of 1000 high-rated movies, with the following columns:

imdb_top1k.csv

Variable	Description
`title`	the title of the movie
`release_year`	year the movie was released
`description`	short description of the movie
`type`	the type of production e.g. Movie, TV Episode
`rating`	the ratig/certificate e.g. PG
`runtime`	the movie runtime in minutes
`imdb_rating`	the IMDB rating
`genre`	list of genres e.g. Comedy, Drama etc.
`director`	the director of the movie
`stars`	list of actors in the movie
`gross`	the domestic gross of the movie in US dollars (what we want to predict)

Finally you have access to a dataset of movie production budgets for over 6,000 movies, with the following columns:

movie_budgets.csv

Variable	Meaning
`year`	year the movie was released
`date`	date the movie was released
`title`	title of the movie
`production budget`	production budget in US dollars

Note: while you may augment the Christmas movies with the general movie data, the model should be developed to predict ratings of Christmas movies only.

import pandas as pd
xmas_movies = pd.read_csv('data/christmas_movies.csv')
xmas_movies

top1k_movies = pd.read_csv('data/imdb_top1k.csv')
top1k_movies

movie_budgets = pd.read_csv('data/movie_budgets.csv')
movie_budgets

💪 Competition challenge

Create a report that covers the following:

Exploratory data analysis of the dataset with informative plots. It's up to you what to include here! Some ideas could include:
- Analysis of the genres
- Descriptive statistics and histograms of the grossings
- Word clouds
Develop a model to predict the movie's domestic gross based on the available features.
- Remember to preprocess and clean the data first.
- Think about what features you could define (feature engineering), e.g.:
  - number of times a director appeared in the top 1000 movies list,
  - highest grossing for lead actor(s),
  - decade released
Evaluate your model using appropriate metrics.
Explain some of the limitations of the models you have developed. What other data might help improve the model?
Use your model to predict the grossing of the following fictitious Christmas movie:

Title: The Magic of Bellmonte Lane

Description: "The Magic of Bellmonte Lane" is a heartwarming tale set in the charming town of Bellmonte, where Christmas isn't just a holiday, but a season of magic. The story follows Emily, who inherits her grandmother's mystical bookshop. There, she discovers an enchanted book that grants Christmas wishes. As Emily helps the townspeople, she fights to save the shop from a corporate developer, rediscovering the true spirit of Christmas along the way. This family-friendly film blends romance, fantasy, and holiday cheer in a story about community, hope, and magic.

Director: Greta Gerwig

Cast:

Emma Thompson as Emily, a kind-hearted and curious woman
Ian McKellen as Mr. Grayson, the stern corporate developer
Tom Hanks as George, the wise and elderly owner of the local cafe
Zoe Saldana as Sarah, Emily's supportive best friend
Jacob Tremblay as Timmy, a young boy with a special Christmas wish

Runtime: 105 minutes

Genres: Family, Fantasy, Romance, Holiday

Production budget: $25M

🧑‍⚖️ Judging criteria

CATEGORY	WEIGHTING	DETAILS
Recommendations	35%	Clarity of recommendations - how clear and well presented the recommendation is. Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid? Number of relevant insights found for the target audience.
Storytelling	35%	How well the data and insights are connected to the recommendation. How the narrative and whole report connects together. Balancing making the report in-depth enough but also concise.
Visualizations	20%	Appropriateness of visualization used. Clarity of insight from visualization.
Votes	10%	Up voting - most upvoted entries get the most points.

✅ Checklist before publishing into the competition

Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
Remove redundant cells like the judging criteria, so the workbook is focused on your story.
Make sure the workbook reads well and explains how you found your insights.
Try to include an executive summary of your recommendations at the beginning.
Check that all the cells run without error