employee turnover - ef
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Can you help reduce employee turnover?

    šŸ“– Background

    You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

    The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

    šŸ’¾ The data

    The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

    • "department" - the department the employee belongs to.
    • "promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
    • "review" - the composite score the employee received in their last evaluation.
    • "projects" - how many projects the employee is involved in.
    • "salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
    • "tenure" - how many years the employee has been at the company.
    • "satisfaction" - a measure of employee satisfaction from surveys.
    • "avg_hrs_month" - the average hours the employee worked in a month.
    • "left" - "yes" if the employee ended up leaving, "no" otherwise.
    import pandas as pd
    import seaborn as sns
    from matplotlib import pyplot as plt, cm
    import matplotlib.ticker as mtick
    from statsmodels.graphics.mosaicplot import mosaic
    
    df = pd.read_csv('./data/employee_churn_data.csv')
    df.head()

    āŒ›ļø Time is ticking. Good luck!

    Which department has the highest employee turnover? Which one has the lowest?

    df.info()
    print("we can observe that there are no missing values in the dataset")
    df.describe()
    df['left']=df['left'].apply(lambda x: 1 if x=='yes' else 0)

    Calculation of turnover rate

    #turnover rate calculation as nĀ° of employees who leave / total # of employees
    tot_emp=df['left'].count()
    tot_left=df['left'].sum()
    print("Turnover rate  of the company is:", "{0:.1%}".format(tot_left/tot_emp))
    print("The department with the highest turnover rate is IT\nThe department with the lowest turnover rate is Finance")
    
    #turnover rate calculation by department
    df_dep=df.groupby('department',as_index=False).agg({'left' : ['count', 'sum']})
    df_dep['t_rate']=df_dep.loc[:, 'left']['sum']/df_dep.loc[:, 'left']['count']
    #df_dep.set_index('department',inplace=True)
    
    plt.figure(figsize=(12,5))
    # make barplot and sort bars
    bp=sns.barplot(x=df_dep['department'],
                y=df_dep["t_rate"], 
                data=df_dep, 
                order=df_dep.sort_values(by='t_rate', ascending=False).department,
                palette="winter"
               )
    # set labels
    plt.xlabel("Department", size=12)
    plt.ylabel("Employee turnover percentage", size=12)
    plt.title("Turnover rate by department", size=12)
    bp.yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1, decimals=None, symbol='%'))
    plt.show()
    plt.rcParams["figure.figsize"]=(15,5)
    df_1=df[df['left']==1]
    df_0=df[df['left']==0]
    lista=df_dep['department'].unique().tolist()
    props = {}
    label = {}
    for i in lista:
        props[(i,'1')] = {'color': 'orange'}
        props[(i,'0')] = {'color': 'aquamarine'}
        label[(i,'1')]=df_1.groupby(by=['department'])['left'].count()[i]
        label[(i,'0')]=df_0.groupby(by=['department'])['left'].count()[i]
    mosaic(df,['department','left'], properties=props, labelizer=lambda k:label[k])
    plt.show()
    print("It seems that there is little variability in the department between employees that leave and not leave")
    # check the distribution 
    df['left'].value_counts()/df.shape[0]
    print("Dataset is quite unbalanced")
    #DUMMIFY
    dep_d=pd.get_dummies(df['department'], prefix='Dep', drop_first=True)
    df = pd.concat([df, dep_d], axis=1)
    df['salary']=df['salary'].apply(lambda x: 0 if x=='low'  else 1 if x=='medium' else 2)
    ā€Œ
    ā€Œ
    ā€Œ