Duplicate of Meeting the Data Visualization Criteria for Certification
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Meeting the Data Visualization Criteria for Certification

    The most common reason for failing any of our certification practical exams (Data Analyst or Scientist, Associate or Professional) is not meeting the Data Visualization criteria. So let's take a dive in and look at what we are looking for and make sure you don't have the same problems.

    Lets start by reviewing the criteria. All of our certifications require that you show the same three things:

    • Has created at least two different visualizations of single variables (e.g. histogram, bar chart, single boxplot)
    • Has created at least one visualization including two or more variables (e.g. scatterplot, filled barchart, multiple boxplots)
    • Has used visualizations that support the findings being presented

    Our grading team are looking to see that you know how to create different types of visualizations and that you can pick the right data and situation for them.

    Graphics for Single Variables

    The first thing we are looking for is two graphics that tell us about single variables. You can pick any graphics you want - histogram, box plot, bar chart, density plot, dot plot,... The only requirement is that it just includes one variable.

    Lets take a look at some examples.

    I am going to use the Toyota used car data that is included in our Professional level sample case studies. You can see the columns that are included in the data below.

    Hidden code

    Graphics that would meet the single variable criteria include a box plot that shows counts of each category:

    Hidden code

    A histogram of the distribution of a numeric variable:

    Hidden code

    I could create a boxplot, but just for one variable

    ggplot(cars, aes(price)) +
    geom_boxplot() +
    labs(title = "Distribution of prices of sold cars", 
         subtitle = "Showing the median and lower and upper quartiles") + 
    theme_classic()

    Or an area chart, like this one that shows the counts for the number of cars sold in each year as an area:

    ggplot(cars, aes(year)) + 
    geom_area(stat = 'bin', binwidth = 1) + 
    theme_classic()

    Whatever you pick the important thing to remember is that you are only using one single variable. If you add another variable to show differences between groups, that won't count.

    Remember, just one variable for this criteria. And we want to see two different types of graphics, so they can't both be bar charts.

    Graphics for Multiple Variables

    The second criteria is the most fun. For this one we are looking to see that you can create graphics that include multiple variables - two, three, four, ten! As many as you want, but I would suggest you keep to two or three so we can also interpret your graphic!

    This is where you can get out your filled/stacked bar charts, your scatter plots, your heatmaps, your panelled/facetted graphics. Let's take a look at a few using the Toyota data.

    Maybe I want to talk about the relationship between price and the model of car...