An Introduction to SHAP Values and Machine Learning Interpretability

Beta

Telecom Customer Churn Dataset

This dataset comes from an Iranian telecom company, with each row representing a customer over a year period. Along with a churn label, there is information on the customers' activity, such as call failures and subscription length.

import shap
import pandas as pd
import numpy as np
shap.initjs()

customer = pd.read_csv("data/customer_churn.csv")
customer.head()

customer.Churn.value_counts()

X = customer.drop("Churn", axis=1) # Independent variables
y = customer.Churn # Dependent variable

Training a machine learning model

# Split into train and test 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

# Train a machine learning model (example: Random Forest)
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

Model Evaluation

from sklearn.metrics import classification_report

# Make prediction on the testing data
y_pred = clf.predict(X_test)

# Classification Report
print(classification_report(y_pred, y_test))

Explain the model's predictions using SHAP

explainer = shap.Explainer(clf)
shap_values = explainer.shap_values(X_test)

Summarize feature importances

# Summarize feature importances 
shap.summary_plot(shap_values, X_test)

Feature importances of Label "0"

shap.summary_plot(shap_values[0], X_test)

shap.summary_plot(shap_values[1], X_test)

‌
‌
‌