Competition - Improving Customer Segmentation

Beta

Can you find a better way to segment your customers?

📖 Background

💾 The data

Doctors contains information on doctors. Each row represents one doctor.

Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.

Complaints collects information on doctor complaints.

Instructions has information on whether the doctor includes special instructions on their orders.

💪 Competition challenge

🧑‍⚖️ Judging criteria

✅ Checklist before publishing into the competition

⌛️ Time is ticking. Good luck!

Can you find a better way to segment your customers?

📖 Background

You work for a medical device manufacturer in Switzerland. Your company manufactures orthopedic devices and sells them worldwide. The company sells directly to individual doctors who use them on rehabilitation and physical therapy patients.

Historically, the sales and customer support departments have grouped doctors by geography. However, the region is not a good predictor of the number of purchases a doctor will make or their support needs.

Your team wants to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning.

💾 The data

The company stores the information you need in the following four tables. Some of the fields are anonymized to comply with privacy regulations.

Doctors contains information on doctors. Each row represents one doctor.

"DoctorID" - is a unique identifier for each doctor.
"Region" - the current geographical region of the doctor.
"Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
"Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
"Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
"Satisfaction" - measures doctors' satisfaction with the company.
"Experience" - relates to the doctor's experience with the company.
"Purchases" - purchases over the last year.

Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.

"DoctorID" - doctor id (matches the other tables).
"OrderID" - order identifier.
"OrderNum" - order number.
"Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.

Complaints collects information on doctor complaints.

"DoctorID" - doctor id (matches the other tables).
"Complaint Type" - the company's classification of the complaints.
"Qty" - number of complaints per complaint type per doctor.

Instructions has information on whether the doctor includes special instructions on their orders.

"DoctorID" - doctor id (matches the other tables).
"Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.

DataFrame

Current Type: Bar

Type

Current X-axis: None

X-axis

Current Y-axis: None

Y-axis

Current Color: None

Color

import pandas as pd
doctors = pd.read_csv('data/doctors.csv')
doctors

orders = pd.read_csv('data/orders.csv')
orders

complaints = pd.read_csv('data/complaints.csv')
complaints

instructions = pd.read_csv('data/instructions.csv')
instructions

💪 Competition challenge

Create a report that covers the following:

How many doctors are there in each region? What is the average number of purchases per region?
Can you find a relationship between purchases and complaints?
Define new doctor segments that help the company improve marketing efforts and customer service.
Identify which features impact the new segmentation strategy the most.
Your team will need to explain the new segments to the rest of the company. Describe which characteristics distinguish the newly defined segments.

DataFrame

Current Type: Bar

Type

Current X-axis: DoctorID

X-axis

Current Y-axis: Incidence rate

Y-axis

Current Color: index

Color

#For standardizing features. We'll use the StandardScaler module.
from sklearn.preprocessing import StandardScaler
#Hierarchical clustering with the Sci Py library. We'll use the dendrogram and linkage modules.
from scipy.cluster.hierarchy import dendrogram, linkage
#Sk learn is one of the most widely used libraries for machine learning. We'll use the k means and pca modules.
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

from sklearn.preprocessing import LabelEncoder, OrdinalEncoder 
enc_i = OrdinalEncoder()

doctors_transformed = enc_i.fit_transform(doctors)

scaler = StandardScaler()
segmentation_std = scaler.fit_transform(doctors_transformed)

‌
‌
‌

Competition - Improving Customer Segmentation

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Can you find a better way to segment your customers?

📖 Background

💾 The data

Doctors contains information on doctors. Each row represents one doctor.

Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.

Complaints collects information on doctor complaints.

Instructions has information on whether the doctor includes special instructions on their orders.

💪 Competition challenge

Can you find a better way to segment your customers?