Engineering And IT departments needs Attention

You work in the analytics department of a multinational company, and the head of HR wants your help mapping out the company's employee network using message data.

They plan to use the network map to understand interdepartmental dynamics better and explore how the company shares information. The ultimate goal of this project is to think of ways to improve collaboration throughout the company.

💾 The data

The company has six months of information on inter-employee communication. For privacy reasons, only sender, receiver, and message length information are available (source).

Messages has information on the sender, receiver, and time.

"sender" - represents the employee id of the employee sending the message.
"receiver" - represents the employee id of the employee receiving the message.
"timestamp" - the date of the message.
"message_length" - the length in words of the message.

Employees has information on each employee;

"id" - represents the employee id of the employee.
"department" - is the department within the company.
"location" - is the country where the employee lives.
"age" - is the age of the employee.

Acknowledgments: Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. "Patterns and dynamics of users' behavior and interaction: Network analysis of an online community." Journal of the American Society for Information Science and Technology 60.5 (2009): 911-932.

Executive Summary

Although we there are people in the engineering and IT department. It does not seem they use the platform at all to communicate with other members of staff. This could be true for various reasons. Nonetheless we are certain that it can not be for lack of knowing how to use this platform. Let us take a look at how their communications with other departments perform. And some countries too also need deep attention.

We migt want to take a look at our performance in nations like brasil, Germany, and UK

import pandas as pd

messages = pd.read_csv('data/messages.csv', parse_dates= ['timestamp'])

messages

employees = pd.read_csv('data/employees.csv')
employees

Exploratory Data Analysis

A short view of what the data present entails would be of good use.

print(employees.columns)

print(messages.columns)
print(messages.message_length.describe())
print(messages.sender.describe())

print(messages.info())
print(employees.info())

Seeing that the id column is considered to be an integer, we would need to change that as we would not want to carry out numerical or statistical calculations on it but we would rather need to use it as a category of peole who use that id.

# Changing the various data types into the neccesary category such as id
employees['id'] = employees['id'].astype('category')
messages['sender'] = messages['sender'].astype('category')
messages['receiver'] = messages['receiver'].astype('category')
print(employees.dtypes)

print(messages.dtypes)

Who is Talking To Who

After viewing a quick semantics of the data, and noticing that there are no missing values seems like everyone is interested in communicating, but the question is who are they talking to? do number of messages depend on department or people send relatively a given lenght of message,and of how the senders are related to the receivers.


# Who talks the most
import matplotlib.pyplot as plt
import seaborn as sns
plt.hist(messages['message_length'], bins = 40)
plt.xlabel('Length of Messages')
plt.ylabel('Number of Messages')
plt.title('Distribution of Lenght Of Messages')
plt.show()

We can all use the same number of words

With the even distribution of the histogram of the message length, it truly seems as though people in the company do nothave a specific amount or length of messages they intend to send. Rather, there is a probability of a likelyhood that any number of messages can be sent. But even in that stead we might want to find out what category of people speaks the most.

‌
‌
‌