this is the nav!
Workspace
Mudiaga Akpubi/

Sample Data Analyst Associate Solution (copy)

0
Beta

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Data Analyst Associate Example Practical Exam

This is an example of a passing solution to the Data Analyst Associate Prcatical Exam. You can find the project information that accompanies this example solution in the resource center, Practical Exam Resources.

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

For every column in the data:

• State whether the values match the description given in the table above.

• State the number of missing values in the column.

• Describe what you did to make values match the description if they did not match.

Example Solution

Region: There are 10 unique values that match the description given. There are no missing values. No changes were made to this column.

Place name: There were 187 unique values. There were no missing values. No changes were made to this column.

Place type: There were four unique values that matches the four given in the data dictionary. There were no missing values so no changes were made to this column.

Rating: The values of this column were between 3 and 5, which is consistent with the description given. There were two missing values. The missing values were replaced with 0 as per the data description.

Reviews: The values of this column ranged from 3 to 17937, which is consistent with the description given. Two values were missing. The missing values were replaced with the median value of the remaining data, which was 271.5.

Price: This column has three categories, that match those in the description. There were no missing values and no changes were made to this column.

Delivery option: All of the values in this column were either TRUE or FALSE. There were no missing values. No changes were made to this column.

Dine in option: The values in this column were either TRUE or missing. There were 60 missing values. All missing values were replaced with FALSE.

Takeout option: The values in this column were either TRUE or missing. There were 56 missing values. All missing values were replaced with FALSE.

Create a visualization that shows which is the most common type of coffee store. Use the visualization to:

• State which category of the variable place type the most observations

• Explain whether the observations are balanced across categories

Example Solution

There are four possible types of store included in this data. The most common type listed is a coffee shop, with cafe being second although with half the number of locations. The categories are unbalanced, with most observations being either Coffee Shop or Cafe. The team should focus on distributing their new cups in coffee shops as they are more common.

Describe the distribution of all of the number of reviews. Your answer must include a visualization that shows the distribution.

Example Solution

As the marketing team thinks that the number of reviews a place gets will be important, we should look at how the number of reviews is distributed.

Looking at all reviews, we can see from the graphic below that most places have had less than 1000 reviews. The distribution of the number of reviews is right skewed. There are some outliers that get more than 2000 reviews but this is very uncommon.

When looking for places that have high reviews the team should aim for locations having over 1000 reviews, but be aware they may need to work with 500 reviews or more.

Describe the relationship between type of store and number of reviews. Your answer must include a visualization to demonstrate the relationship.

Example Solution

Finally we want to combine the two pieces of information to see how the place type impacts number of reviews. So far coffee shops with over 1000 reviews would be ideal but we need to look at the two variables together to see if this is realistic.

When looking at just the reviews we excluded a single outlying value so that we could see the majority of the data. To show the impact, we can look at the range of number of reviews by place with this outlier in the data. In the graphic below you can see that this outlier is dominating the data and making comparison difficult. To make it easier to compare the rest of the data, we will remove this outlier.

After we remove the outlier we can focus on the main range of data. Although Coffee Shops do include the place types with the largest number of reviews, the interquartile range of the number of reviews is lower than Cafe and espresso bar types. This would suggest that the majority of the number of reviews may be lower than other types. However, this could also be an effect of having the largest number of locations, so the large number of low review locations brings the median down.

Based on all of the above, we would recommend that the team focus on coffee shops with reviews over 1000 to start, but also keeps an open mind to including cafes and espresso bars with high reviews. Further analysis should be done to understand if store type really does impact the number of reviews. The team should also consider including their cups in stores with lower reviews so that we can further analyze whether reviews has any impact over the popularity of the new cups.

✅ When you have finished...

• Publish your Workspace using the option on the left
• Check the published version of your report:
• Can you see everything you want us to grade?
• Are all the graphics visible?
• Review the grading rubric. Have you included everything that will be graded?