Duplicate of Model Development and the Practical Exam

Beta

Model Development and the Practical Exam

If you are working towards the data scientist certifications, as well as the data validation and data visualizations skills, you will also need to demonstrate your ability to select and apply appropriate models to a given business problem.

Specifically we are looking for:

Correctly identified the type of problem (regression, classification or clustering)

Has selected and fitted a model for that problem to be used as a baseline.

Has selected and fitted a comparison model for the problem that they were provided.

Along with:

Compared the performance of the two models/approaches using any method appropriate to the type of problem.

Has described what the model comparison shows about the selected approaches.

Everything here applies to both associate and professional level, but be aware, if you are working on the professional level, there are some additional requirements related to the business application that you will also need to meet.

What type of problem is it?

The first thing we want you to demonstrate is that you can convert a business problem into an analytic one. We will give you a problem posed from the business perspective. But what type of modelling problem is it? Is it a regression problem? A classification problem? An unsupervised clustering problem?

(Hint: it will only be one of those three).

For example, in the Associate Data Analyst sample, we give you a problem where the business is wanting to predict high or low numbers of reviews. You want to be able to predict a binary (high or not) outcome, so this would be a binary classification problem.

And that is all we need you to tell us.

How many models?

Now you know what type of problem you are working on, you want to start fitting some models. There are two things to avoid here. The first is fitting too many models, the second is not fitting enough models.

As you will see from the criteria above, we are looking for you to fit two models. Any time you are developing a model it is really good practice to start with something simple to use as a baseline. From this you can compare any other models that you choose to fit.

My tip for your baseline model is to keep it really simple. Remember when you were taught about linear regression or logistic regression? Now is the time to use those methods. They are simple to fit, simple to interpret and give you a starting point to go from.

From there, you can really go on and fit anything that you think appropriate. But, you only need to fit one additional model. As well as seeing people forget to fit a second model, one of the other things we see a lot is people fitting every single model they have seen before. This isn't really recommended for any analysis project, and in your practical exam we would rather you focus on fitting a small number of models and demonstrating your ability to pick appropriate methods to solve the problem.

Evaluating the models

Now you have two models, you need to somehow determine which of these models is doing a better job. When it comes to the evaluation criteria we want to see that you have performed a technical evaluation of your model. Unlike your courses, here you are going to have to choose which method you are going to use to compare models.

And when you have done that, we also want to know what that tells you about the models. Which of the models performs better? Which would you recommend to use to approach the business problem you were given? It doesn't need to be long, but we want to see that you can take the learnings from your model development process.

Take a look at an example...

You can find examples of what we are looking for in the modelling in our two sample solutions: