Toyota Used Car Price Prediction Project
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Toyota Case Study

    Project Brief

    You have been hired as a data scientist at Discount Motors, a used car dealership in the UK. The dealership is expanding and has hired a large number of junior salespeople. Although promising, these junior employees have difficulties pricing used cars that arrive at the dealership. Sales have declined 18% in recent months, and management would like your help designing a tool to assist these junior employees.

    To start with, they would like you to work with the Toyota specialist to test your idea(s). They have collected some data from other retailers on the price that a range of Toyota cars were listed at. It is known that cars that are more than £1500 above the estimated price will not sell. The sales team wants to know whether you can make predictions within this range.

    You will need to present your findings in two formats:

    • You must submit a written report summarising your analysis to your manager. As a data science manager, your manager has a strong technical background and wants to understand what you have done and why.

    • You will then need to share your findings with the head of sales in a 10 minute presentation. The head of sales has no data science background but is familiar with basic data related terminology.

    The data you will use for this analysis can be accessed here: "data/toyota.csv"

    Author

    Author: Kasidis Satangmongkol (Toy)

    Date: 6 June 2022

    Business Objective

    Based on the business requirements, a regression model (price prediction) will be trained to support sales team. Our objective is to make the best prediction, with model's error +/- absolute £1500.

    Data Science Steps

    1. Load and explore dataset using summary statistics and visualization
    2. Data cleaning and preparation
    3. Train test and evaluate model
    4. Conclusion and recommendations

    0. Setting Up

    We will install two packages for model training - caret and randomForest and load all required libraries in the cells below.

    # install caret for model training
    install.packages("caret")
    install.packages("randomForest")
    # load library
    library(tidyverse)
    library(glue)
    library(stringr)
    library(caret)
    library(randomForest)

    1. Load and Explore Data

    Let's load csv file into a dataframe and explore its property.

    # load dataset
    df <- read_csv("data/toyota.csv")
    # head data
    head(df)

    It's also good practice to review both head and tail of the dataframe.

    # tail data
    tail(df)
    # explore data and check missing values
    glimpse(df)

    We will need to change character columns [model, transmission, fuelType] to factor later in this notebook.