Daniel Wissa/

Employee Network Analysis


Employee Network Analysis


Executive Summary

For working in the analytics department of the company, It was required to map out the company's employee network using message data. The plan was to use the network map to understand interdepartmental dynamics better and explore how the company shares information. The ultimate goal of this project is to think of ways to improve collaboration throughout the company.

Throughout analyzing the data, a list of key questions were asked to help better understand the data and get better recommendations.
So through this report, the following topics shall be covered

  1. Which departments are the most/least active?
  2. Which employee has the most connections?
  3. Identifying the most influential departments and employees
  4. In which departments is it recommended to boost collaboration

And upon completing the analysis, It was found out that:
  1. The most active department is Sales, While the least active is Marketing

  2. The employee with the most number of connections is the employee with ID 598 having a total of 81 connections

  3. The IDs of the most 5 influential employees in a descending order are:
      a. 598
      b. 144
      c. 128
      d. 605
      e. 586
    And the most influential departments, sorted also in a descending order:
      a. Sales
      b. Operations
      c. Admin
      d. IT
      e. Engineering
      f. Marketing

  4. Finally, It was recommended to boost collaboration in the engineering and marketing departments due to the lack of activity there.


Setup the Environment

Here we load all the modules used in the notebook. These modules are:
  - Pandas (For data manipulation and storage)
  - Numpy (For data manipulation)
  - Seaborn (For visualizations)
  - Matplotlib (For visualizations)
  - NetworkX (For modeling the network)
  - NxViz (For visualizing the network)
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np
import networkx as nx

    import nxviz
    from nxviz import annotate
except ModuleNotFoundError:
    ! pip install nxviz
    import nxviz
    from nxviz import annotate

Loading the Data

Here we load our data from the csv files into two dataframes
Then we merge the two frames into a single big one
#Loading the messages and employees from their csv files
messages = pd.read_csv('data/messages.csv', parse_dates= ['timestamp'])
employees = pd.read_csv('data/employees.csv')

#Creating a dataframe for the senders data
senders = messages[['sender']]
senders = senders.merge(employees, left_on='sender', right_on='id', how='left')
senders = senders.drop(columns=['sender'])
senders.columns = x: 'sender_' + str(x))

#Creating a dataframe for the receivers data
receivers = messages[['receiver']]
receivers = receivers.merge(employees, left_on='receiver', right_on='id', how='left')
receivers = receivers.drop(columns=['receiver'])
receivers.columns = x: 'receiver_' + str(x))

#Merging the senders and receivers data
data = messages.drop(columns=['sender','receiver'])

Data = pd.concat([senders, receivers, data], axis=1)

What are the Most/Least Active Departments?

Now we start to analyze our data
In Figure 1, we can observe the total count of messages grouped by each department, sorted in descending order of the total messages
The sum is split into two sub-categories, messages sent on the left and messages received on the right.
It is clear that the sales department is the most active, while the marketing department is the least.
#Counting the total messages sent by each department
active_senders_department = Data.groupby('sender_department').size().to_frame()
active_senders_department.index = active_senders_department.index.rename('Department')
active_senders_department.columns = ["Senders"]

#Counting the total messages received by each department
active_receivers_department = Data.groupby('receiver_department').size().to_frame()
active_receivers_department.index = active_receivers_department.index.rename('Department')
active_receivers_department.columns = ["Receivers"]

#Counting the total messages activity by merging the two dataframes and summing each row
activity = pd.concat([active_senders_department, active_receivers_department], axis=1)
activity['Total'] = activity.sum(axis=1)
activity = activity.sort_values(by='Total', ascending=False)


hfont = {'fontfamily':'cursive'}
font_color = '#525252'
color1 = '#ed6a5a' #RED
color2 = '#00CFC1' #TEAL

#Creating 2 charts that are so close almost seen as one big chart
fig, axes = plt.subplots(figsize=(20,6), ncols=2, sharey=True)
plt.subplots_adjust(wspace=0, top=0.85, bottom=0.1, left=0.18, right=0.95)

#Plotting the data on each chart
sns.barplot(data=activity, x='Senders', y=activity.index.values, ax=axes[0], color=color1).set(xlabel=None)
sns.barplot(data=activity, x='Receivers', y=activity.index.values, ax=axes[1], color=color2).set(xlabel=None)

#Inverting the left chart Axes

#Setting each chart title
axes[0].set_title("Sent", fontsize=20, pad=15, color=color1, **hfont, weight="bold")
axes[1].set_title("Received", fontsize=20, pad=15, color=color2, **hfont, weight="bold")

#Setting the x-axis tick values
ticks = np.arange(0,1900,200)

#Setting the x and y labels colors
for label in (axes[0].get_xticklabels() + axes[0].get_yticklabels()):
    label.set(fontsize=14, color=font_color, **hfont)
for label in (axes[1].get_xticklabels() + axes[1].get_yticklabels()):
    label.set(fontsize=14, color=font_color, **hfont)

#Annotating the bar values
for p in axes[0].patches:
    axes[0].annotate(int(p.get_width()) , (p.get_width()+110, p.get_y()+0.5), fontsize=14, weight="bold")
for p in axes[1].patches:
    axes[1].annotate(int(p.get_width()) , (p.get_width()+20, p.get_y()+0.5), fontsize=14, weight="bold")

#Setting the plot title
plt.suptitle("Total Messages Activity by Departments", fontsize=30, weight="bold",x=0.56, y=1.05, **hfont)
plt.figtext(0.568, 0, "Figure 1", ha="center", fontsize=10)

Which Employee Has the Most Connections?

  • AI Chat
  • Code