Certification - Bckp

Beta

Data Analyst Professional Practical Exam Submission

📝 Task List

Data validation

Exploratory Analysis

How many customers were there for each approach?

What does the spread of the revenue look like overall? And for each method?

Was there any difference in revenue over time for each of the methods?

We don’t really know if there are other differences between the customers in each group, so

✅ When you have finished...

Data Analyst Professional Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

📝 Task List

Your written report should include written text summaries and graphics of the following:

Data validation:
- Describe validation and cleaning steps for every column in the data
Exploratory Analysis:
- Include two different graphics showing single variables only to demonstrate the characteristics of data
- Include at least one graphic showing two or more variables to represent the relationship between features
- Describe your findings
Definition of a metric for the business to monitor
- How should the business use the metric to monitor the business problem
- Can you estimate initial value(s) for the metric based on the current data
Final summary including recommendations that the business should undertake

Start writing report here..

import pandas as pd
import seaborn as sns
import numpy as np
from datetime import date

# seaborn layout
sns.set_style('whitegrid')
sns.set_context('notebook')
sns.set_palette('colorblind')

df = pd.read_csv('https://s3.amazonaws.com/talent-assets.datacamp.com/product_sales.csv')

Data validation

The data consists of 15000 obervations (13924 after cleaning) and 9 variables:

week: weeks since product launch, ranges from 1 to 6. No cleaning neccessary
sales_method: 3 different sales methods. Capitalisation was inconsistent, some terms were truncated. I have unified the sales_method categories, all categories are now complete and capitalised.
customer_id: Alle customer ids are unique. No cleaning was necessary.
nb_sold: There are gaps in the distribution of quantities (strikingly few sales for unit numbers of 14 and 16). I have not made any changes with regard to this variable. Are there perhaps reasons for the unequal distribution that are known to you?
revenue: A relatively large amount of entries was missing. Since this is the variable of interest and I have not found any connection between the absence of entries and other variables I deletet all Rows with missing revenue values.
years_as_customer: 2 entries were larger than the time since our company exists. I deleted these 2 rows.
nb_site_visits: No problems, no missing values.
state: 50 different States. No problems, no missing values.

df.info()

df.isna().sum()

df.describe()

df.head()

df = df.assign(null_revenue = df['revenue'].notnull())

sns.countplot(x = 'week', hue = 'null_revenue', data = df)

sns.countplot(x = 'week', hue = 'sales_method', data = df)

sns.countplot(x = 'sales_method', data = df)

df.columns
np.mean(df.duplicated('customer_id'))

maxYears = date.today().year - 1984
print(maxYears)
np.mean(df['years_as_customer'] > maxYears)

‌
‌
‌

Certification - Bckp

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Data Analyst Professional Practical Exam Submission

📝 Task List

Data validation

Data Analyst Professional Practical Exam Submission