An Introduction to SHAP Values and Machine Learning Interpretability
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Telecom Customer Churn Dataset

    This dataset comes from an Iranian telecom company, with each row representing a customer over a year period. Along with a churn label, there is information on the customers' activity, such as call failures and subscription length.

    import shap
    import pandas as pd
    import numpy as np
    shap.initjs()
    
    customer = pd.read_csv("data/customer_churn.csv")
    customer.head()
    customer.Churn.value_counts()
    X = customer.drop("Churn", axis=1) # Independent variables
    y = customer.Churn # Dependent variable

    Training a machine learning model

    # Split into train and test 
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
    
    # Train a machine learning model (example: Random Forest)
    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier()
    clf.fit(X_train, y_train)

    Model Evaluation

    from sklearn.metrics import classification_report
    
    # Make prediction on the testing data
    y_pred = clf.predict(X_test)
    
    # Classification Report
    print(classification_report(y_pred, y_test))

    Explain the model's predictions using SHAP

    explainer = shap.Explainer(clf)
    shap_values = explainer.shap_values(X_test)

    Summarize feature importances

    # Summarize feature importances 
    shap.summary_plot(shap_values, X_test)

    Feature importances of Label "0"

    shap.summary_plot(shap_values[0], X_test)
    shap.summary_plot(shap_values[1], X_test)