Bashir Abdulraheem/

Associate Data Analyst Case Study Project - With Python


DataCamp Associate Data Analyst Case Study Project - Food Claims Process


Table of Contents

  • Introduction
  • Importing required files and libraries
  • Data Inspection
  • Data Cleaning
  • Sanity Check After Data Cleaning
  • Data Exploration
  • Further Data Exploration
  • Conclusions

This case study is about a fast food restaurant in Brazil where consumers file claims against such as food poisoning. Vivendo fast food is the name of the fast food to be used in this case study.

Vivendo is a fast food chain in Brazil with over 200 outlets. As with many fast food establishments, customers make claims against the company. For example, they blame Vivendo for suspected food poisoning.

The legal team, who processes these claims, is currently split across four locations. The new head of the legal department wants to see if there are differences in the time it takes to close claims across the locations.

Customer Question: The legal team would like you to answer the following questions:

  • How does the number of claims differ across locations?
  • What is the distribution of time to close claims?
  • How does the average time to close claims differ by location?

Dataset: The dataset contains one row for each claim. The dataset can be downloaded from here.

The following are the dataset descriptions:

  • Claim ID: Character, the unique identifier of the claim.
  • Time to Close: Numeric, number of days it took for the claim to be closed.
  • Claim Amount: Numeric, initial claim value in the currency of Brazil.
  • Amount Paid: Numeric, total amount paid after the claim closed in the currency of Brazil.
  • Location: Character, location of the claim, one of “RECIFE”, “SAO LUIS”, “FORTALEZA”, or “NATAL”.
  • Individuals on Claim: Numeric, number of individuals on this claim.
  • Linked Cases: Binary, whether this claim is believed to be linked with other cases, either TRUE or FALSE.
  • Cause: Character, the cause of the food poisoning injuries, one of ‘vegetable’, ‘meat’, or ‘unknown’.

Importing required files and libraries

# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
# Reading the csv file
# saving it as a dataframe with the name claims

claims = pd.read_csv('claims.csv')

Data Inspection

In this section, The data will be checked for quality and tidiness issues.

# displaying few top rows from the dataframe


The preceding output shows that:

  • The Claim ID column contains some undesired zeros. It also includes two pieces of information in this single column: the Claim ID and the Year of Claim.
  • Some unwanted characters appear before the amount in the Claim Amount column.
  • The Cause column has some empty values.
# displaying some information about the dataframe

The preceding output shows that:

  • The datatype for the Claim Amount column is not accurate.
  • Approximately 80% of the Cause column entries are null.
# Checking for the count of duplicates in the dataframe

It appears above that there is no duplicate in the dataframe.

# displaying some descriptive statistic abput the data


According to the above output, the minimal time to close a claim is -57, which is unusual because there are no negative days in real life.

  • AI Chat
  • Code