[R] PCA: Employee turnover (Competition)
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner
    knitr::opts_chunk$set(echo = TRUE)
    knitr::opts_chunk$set(class.output = "code-background")
    .code-background {
      background-color: lightgreen;
      border: 3px solid brown;
      font-weight: bold;
    }
    
    
    install.packages("FactoMineR")
    install.packages("factoextra")
    
    library(tidyverse)
    library(car)
    library(scales)
    library(psych)
    library(FactoMineR)
    library(factoextra)
    
    theme_set(theme_bw())
    theme_update(plot.title = element_text(hjust = 0.5, size = 20),
                 plot.subtitle = element_text(hjust = 0.5, size = 15),
                 axis.text = element_text(size = 18),
                 axis.title = element_text(size = 18),
                 legend.position = "bottom")
    df <- readr::read_csv('./data/employee_churn_data.csv')
    
    head(df)

    1. Which department has the highest employee turnover? Which one has the lowest?

    df %>%
      mutate(left = ifelse(left == "yes", 1, 0)) %>% 
      group_by(department) %>% 
      summarize(turnover_rate = mean(left)) %>% 
      arrange(desc(turnover_rate)) %>% 
      ggplot(aes(fct_reorder(department, turnover_rate, .desc = T), turnover_rate)) +
      geom_segment(aes(col = department, x = fct_reorder(department, turnover_rate, .desc = T), xend = fct_reorder(department, turnover_rate, .desc = T), y = 0, yend = turnover_rate), show.legend = F) +
      geom_point(show.legend = F, pch = 21, size = 10 ,aes(fill = department)) +
      geom_text(col = "black", aes(x = fct_reorder(department, turnover_rate, .desc = T), y = turnover_rate, label = 100 * round(turnover_rate, 3))) +
      scale_color_brewer(palette = "Set3") +
      scale_fill_brewer(palette = "Set3") +
      scale_y_continuous(labels = label_percent()) +
      labs(x = "Department", y = "Turnover",
           title = "Turnover percentage per deparment",
           subtitle = "n = 9540") +
      coord_flip() +
      theme(legend.position = "none",
            plot.title = element_text(hjust = 0.5),
            plot.subtitle = element_text(hjust = 0.5)) 
    • The IT department has the highest turnover percentage (30.9 %), followed by logistics (30.8 %), retail (30.6 %) and marketing (30.3 %)

    • Support (28.8 %), Engineering (28.8 %), Operations (28.6 %), Sales (28.5 %) and Administration (28.1 %) are in the middle field

    • The finance department has the lowest turnover percentage by quite the margin (26.9 %)

    • Looks like money - even if one is just working with it - does indeed buy happiness

    2. Predictor variables for employee turnover

    • We can use an ordination method like the PCA to answer this question
    • PCA is made for numeric variables though, and we have a lot of categorical variables here
    • But if we dummify these categorical variables (but don't scale and center them), the PCA should work just fine
    
    df_trans <- df %>% 
      select(review, projects, tenure, satisfaction, avg_hrs_month) %>% 
      map_df(~ .x - mean(.x)) %>% #centering on the numeric variables
      map_df(~ .x / sd(.x)) %>%  # scaling of the numeric variables
      cbind(
        df %>% 
      select(department, promoted, salary, bonus, left) %>% 
      map(~ psych::dummy.code(.x)) %>% 
      as.data.frame())
    
    head(df_trans)
    
    df_pca <- PCA(df_trans, graph = F, scale.unit = F,)
    # scaling and centering on the continuous variables was already done earlier
    
    
    plot.PCA(df_pca, choix = "var", alpha.var = "contrib") +
      theme_bw()
    
    • Arrows (variables) that point in the same direction are positively correlated with each other
    • Arrows (variable) that point in opposite directions are negatively correlated with each other
    • Arrows (variables) that are at a 90° angle are not correlated

    3. Recommendations to reduce employee turnover

    • We can infer:
      • Being dissatisfied with the job increases the chances of quitting (not very surprising)
      • Having high review results increases the chances of quitting
      • The chances of an employee quitting seem not to be correlated with (but played a role in constructing the PCA planes:
        • The average working hours per month
        • The person's tenure in that company
      • The other variables don't seem to be having much effect on the likelihood of quitting
    • Recommendations:
      • It looks like as if people that get high scores on their reviews are more likely to quit, because:
        • A) They now know their worth
        • B) Its probably easier to get a new job with a strong review value
        • Therefore, we would recommend to reward people who get good reviews with some sort of compensation (e.g a raise)
      • Efforts to improve job satisfaction (New chairs, tables, maybe a ping-pong room, a PS5 in the break room, a movie room,...), prioritizing the departments with the highest turnover rates (IT, logistics, retail, marketing)
        • Or even stock shares in the company like the cool kids in Silicon Valley do