Competition - exam scores
  • AI Chat
  • Code
  • Report
  • Spinner

    ℹ️ Introduction to data science notebooks

    You can skip this section if you are already familiar with data science notebooks.

    Data science notebooks

    A data science notebook is a document that contains text cells (what you're reading right now) and code cells. What is unique with a notebook is that it's interactive: You can change or add code cells, and then run a cell by selecting it and then clicking the Run button above ( , or Run All ) or hitting shift + enter.

    The result will be displayed directly in the notebook.

    Try running the cell below:

    # Run this cell to see the result
    101 * 1.75 * 16

    Modify any of the numbers and rerun the cell.

    Data science notebooks & data analysis

    Notebooks are great for interactive data analysis. Let's create a pandas DataFrame using the read_csv() function.

    We will load the dataset "exams.csv" containing year-end exam grades for a thousand students.

    By using the head() command, we display the first five rows of data:

    # Importing the pandas module
    import pandas as pd
    
    # Reading in the data
    df = pd.read_csv('data/exams.csv')
    
    # Take a look at the first datapoints
    df.head()

    Data analysis example:

    Find the average reading score for each race/ethnicity group.

    We can use groupby to group the information by the column "race/ethnicity". Then we select the column "reading" and use .mean() to get the average grade for each group:

    df.groupby('race/ethnicity')[['reading']].mean()

    Data science notebooks & visualizations

    Visualizations are very helpful to summarize data and gain insights. A well-crafted chart often conveys information much better than a table.

    It is very straightforward to include plots in a data science notebook. For example, let's look at the average writing score by lunch group and gender.

    We are using the seaborn library for this example. We will use the catplot() function on the data we want to display.

    import seaborn as sns
    
    sns.catplot(x='lunch', y='writing', col='gender', data=df, kind='bar');

    Analyzing exam scores

    Now let's now move on to the competition and challenge.

    📖 Background

    Your best friend is an administrator at a large school. The school makes every student take year-end math, reading, and writing exams.

    Since you have recently learned data manipulation and visualization, you suggest helping your friend analyze the score results. The school's principal wants to know if test preparation courses are helpful. She also wants to explore the effect of parental education level on test scores.

    💾 The data

    The file has the following fields (source):
    • "gender" - male / female
    • "race/ethnicity" - one of 5 combinations of race/ethnicity
    • "parent_education_level" - highest education level of either parent
    • "lunch" - whether the student receives free/reduced or standard lunch
    • "test_prep_course" - whether the student took the test preparation course
    • "math" - exam score in math
    • "reading" - exam score in reading
    • "writing" - exam score in writing
    df.head()

    💪 Challenge

    Create a report to answer the principal's questions. Include:

    1. What are the average reading scores for students with/without the test preparation course?
    2. What are the average scores for the different parental education levels?
    3. Create plots to visualize findings for questions 1 and 2.
    4. [Optional] Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
    5. [Optional 2] The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
    6. Summarize your findings.

    💡 Learn more

    The following DataCamp courses can help review the skills needed for this challenge:

    ✅ Checklist before publishing

    • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
    • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
    • Check that all the cells run without error.