Duplicate of Sample Data Analyst Associate Solution
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Data Analyst Associate

    Example Practical Exam Solution

    You can find the project information that accompanies this example solution in the resource center, Practical Exam Resources.

    You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

    You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

    Data Validation

    Describe the validation tasks you performed and what you found. Have you made any changes to the data to enable further analysis? Remember to describe what you did for every column in the data.

    Write your description here

    The original data is 200 rows and 9 columns. The first thing I did was to remove rows that should be excluded according to the data set description. I first removed rows where the Review value was missing. This was 2 rows, leaving 198 rows of data. I also switched missing values in the Dine in option and Takeaway option with False. There were no values that were false originally so this correctly made these columns True/False values Looking at the remaining columns:

    • There were 10 unique regions, as expected
    • There were 4 place types, as expected
    • There were 185 unique place names, suggesting that some names are duplicated, this should be confirmed with the team providing the data
    • Rating values range from 3.9 to 5.0, so all are within the range expected
    • There are 3 price categories, as expected
    • There are 2 delivery options - True/False, as expected.

    Data Discovery and Visualization

    Describe what you found in the analysis and how the visualizations answer the customer questions in the project brief. In your description you should:

    • Include at least two different data visualizations to demonstrate the characteristics of variables
    • Include at least one data visualization to demonstrate the relationship between two or more variables
    • Describe how your analysis has answered the business questions in the project brief

    Write your description here

    What is the most common place type in this local market?

    There are four possible types of store included in this data. The most common type listed is a coffee shop, with cafe being second although with half the number of locations. This would suggest that the team should focus on distributing their new cups in coffee shops as they are more common.

    How does the range in number of reviews differ across all shops?

    As the marketing team thinks that the number of reviews a place gets will be important, we should look at how the number of reviews is distributed. Looking at all reviews, we can see that most places have had less than 1000 reviews. There are some outliers that get more than 3000 reviews but this is very uncommon. When looking for places that have high reviews the team should aim for locations having over 1000 reviews, but be aware they may need to work with 500 reviews or more.

    How does the number of reviews vary across each place type?

    Finally we want to combine the two pieces of information to see how the place type impacts number of reviews. So far coffee shops with over 1000 reviews would be ideal but we need to look at the two variables together to see if this is realistic.

    When looking at just the reviews we excluded a single outlying value so that we could see the majority of the data. To show the impact, we can look at the range of number of reviews by place with this outlier in the data. In the graphic below you can see that this outlier is dominating the data and making comparison difficult. To make it easier to compare the rest of the data, we will remove this outlier.

    After we remove the outlier we can focus on the main range of data. Although Coffee Shops do include the place types with the largest number of reviews, the interquartile range of the number of reviews is lower than Cafe and espresso bar types. This would suggest that the majority of the number of reviews may be lower than other types. However, this could also be an effect of having the largest number of locations, so the large number of low review locations brings the median down.

    Based on all of the above, we would recommend that the team focus on coffee shops with reviews over 1000 to start, but also keeps an open mind to including cafes and espresso bars with high reviews. Further analysis should be done to understand if store type really does impact the number of reviews. The team should also consider including their cups in stores with lower reviews so that we can further analyze whether reviews has any impact over the popularity of the new cups.

    ✅ When you have finished...

    • Publish your Workspace using the option on the left
    • Check the published version of your report:
      • Can you see everything you want us to grade?
      • Are all the graphics visible?
    • Review the grading rubric. Have you included everything that will be graded?
    • Head back to the Certification Dashboard to submit your practical