Competition: Finding the best cacao beans to make chocolate bars
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    From which countries are the best cacao beans to make chocolate bars?

    BACKGROUND

    You work at a specialty foods import company that wants to expand into gourmet chocolate bars. Your boss needs your team to research this market to inform your initial approach to potential suppliers. After finding valuable chocolate bar ratings online, you need to explore if the chocolate bars with the highest ratings share any characteristics that could help you narrow your search for suppliers (e.g., cacao percentage, bean country of origin, etc.)

    THE DATA

    There is only dataset called chocolate_bars. This file contains information about chocolate bars. Next, we have the data dictionary:

    • "id" - id number of the review
    • "manufacturer" - Name of the bar manufacturer
    • "company_location" - Location of the manufacturer
    • "year_reviewed" - From 2006 to 2021
    • "bean_origin" - Country of origin of the cacao beans
    • "bar_name" - Name of the chocolate bar
    • "cocoa_percent" - Cocoa content of the bar (%)
    • "num_ingredients" - Number of ingredients
    • "ingredients" - B (Beans), S (Sugar), S* (Sweetener other than sugar or beet sugar), C (Cocoa Butter), (V) Vanilla, (L) Lecithin, (Sa) Salt
    • "review" - Summary of most memorable characteristics of the chocolate bar
    • "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding

    PROBLEM STATEMENT

    Our goal for this project is to create a report that answers the following questions:

    • What is the average rating by country of origin?
    • How many bars were reviewed for each of those countries?
    • Create plots to visualize findings for questions 1 and 2.
    • Is the cacao bean's origin an indicator of quality?
    • How does cocoa content relate to rating? What is the average cocoa content for bars with higher ratings (above 3.5)?
    • Your research indicates that some consumers want to avoid bars with lecithin. Compare the average rating of bars with and without lecithin (L in the ingredients).

    LOAD PACKAGES AND DATAFRAMES

    Let's start by loading all the necessary Python packages.

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt

    Using Pandas, we load CSV file into a DataFrame.

    chocolate = pd.read_csv('data/chocolate_bars.csv')

    We can check that the DataFrame was load correctly by verigying the first 5 rows.

    chocolate.head()

    EXPLORATORY DATA ANALYSIS

    Before answering the questions of the project, it is necessary to know the structure of the data. To start we need to know how many observations the database has and the number of variables. The database has 2530 observations and has a total of 11 variables.

    chocolate.shape

    Continuing the analysis, it is necessary to check if there are missing values. It can be seen that there are missing values in the num_ingredients and ingredients columns.

    chocolate.info()

    There are 87 missing values for each column.

    chocolate.isnull().sum()