From which countries are the best cacao beans to make chocolate bars?
BACKGROUND
You work at a specialty foods import company that wants to expand into gourmet chocolate bars. Your boss needs your team to research this market to inform your initial approach to potential suppliers. After finding valuable chocolate bar ratings online, you need to explore if the chocolate bars with the highest ratings share any characteristics that could help you narrow your search for suppliers (e.g., cacao percentage, bean country of origin, etc.)
THE DATA
There is only dataset called chocolate_bars. This file contains information about chocolate bars. Next, we have the data dictionary:
- "id" - id number of the review
- "manufacturer" - Name of the bar manufacturer
- "company_location" - Location of the manufacturer
- "year_reviewed" - From 2006 to 2021
- "bean_origin" - Country of origin of the cacao beans
- "bar_name" - Name of the chocolate bar
- "cocoa_percent" - Cocoa content of the bar (%)
- "num_ingredients" - Number of ingredients
- "ingredients" - B (Beans), S (Sugar), S* (Sweetener other than sugar or beet sugar), C (Cocoa Butter), (V) Vanilla, (L) Lecithin, (Sa) Salt
- "review" - Summary of most memorable characteristics of the chocolate bar
- "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding
PROBLEM STATEMENT
Our goal for this project is to create a report that answers the following questions:
- What is the average rating by country of origin?
- How many bars were reviewed for each of those countries?
- Create plots to visualize findings for questions 1 and 2.
- Is the cacao bean's origin an indicator of quality?
- How does cocoa content relate to rating? What is the average cocoa content for bars with higher ratings (above 3.5)?
- Your research indicates that some consumers want to avoid bars with lecithin. Compare the average rating of bars with and without lecithin (L in the ingredients).
LOAD PACKAGES AND DATAFRAMES
Let's start by loading all the necessary Python packages.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Using Pandas, we load CSV file into a DataFrame.
chocolate = pd.read_csv('data/chocolate_bars.csv')
We can check that the DataFrame was load correctly by verigying the first 5 rows.
chocolate.head()
EXPLORATORY DATA ANALYSIS
Before answering the questions of the project, it is necessary to know the structure of the data. To start we need to know how many observations the database has and the number of variables. The database has 2530 observations and has a total of 11 variables.
chocolate.shape
Continuing the analysis, it is necessary to check if there are missing values. It can be seen that there are missing values in the num_ingredients
and ingredients
columns.
chocolate.info()
There are 87 missing values for each column.
chocolate.isnull().sum()