Competition - Improving Customer Segmentation
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Can you find a better way to segment your customers?

    📖 Background

    You work for a medical device manufacturer in Switzerland. Your company manufactures orthopedic devices and sells them worldwide. The company sells directly to individual doctors who use them on rehabilitation and physical therapy patients.

    Historically, the sales and customer support departments have grouped doctors by geography. However, the region is not a good predictor of the number of purchases a doctor will make or their support needs.

    Your team wants to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning.

    💾 The data

    The company stores the information you need in the following four tables. Some of the fields are anonymized to comply with privacy regulations.

    Doctors contains information on doctors. Each row represents one doctor.
    • "DoctorID" - is a unique identifier for each doctor.
    • "Region" - the current geographical region of the doctor.
    • "Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
    • "Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
    • "Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
    • "Satisfaction" - measures doctors' satisfaction with the company.
    • "Experience" - relates to the doctor's experience with the company.
    • "Purchases" - purchases over the last year.
    Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.
    • "DoctorID" - doctor id (matches the other tables).
    • "OrderID" - order identifier.
    • "OrderNum" - order number.
    • "Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.
    Complaints collects information on doctor complaints.
    • "DoctorID" - doctor id (matches the other tables).
    • "Complaint Type" - the company's classification of the complaints.
    • "Qty" - number of complaints per complaint type per doctor.
    Instructions has information on whether the doctor includes special instructions on their orders.
    • "DoctorID" - doctor id (matches the other tables).
    • "Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.
    Current Type: Bar
    Current X-axis: None
    Current Y-axis: None
    Current Color: None
    import pandas as pd
    doctors = pd.read_csv('data/doctors.csv')
    doctors
    orders = pd.read_csv('data/orders.csv')
    orders
    complaints = pd.read_csv('data/complaints.csv')
    complaints
    instructions = pd.read_csv('data/instructions.csv')
    instructions

    💪 Competition challenge

    Create a report that covers the following:

    1. How many doctors are there in each region? What is the average number of purchases per region?
    2. Can you find a relationship between purchases and complaints?
    3. Define new doctor segments that help the company improve marketing efforts and customer service.
    4. Identify which features impact the new segmentation strategy the most.
    5. Your team will need to explain the new segments to the rest of the company. Describe which characteristics distinguish the newly defined segments.
    Current Type: Bar
    Current X-axis: DoctorID
    Current Y-axis: Incidence rate
    Current Color: index
    #For standardizing features. We'll use the StandardScaler module.
    from sklearn.preprocessing import StandardScaler
    #Hierarchical clustering with the Sci Py library. We'll use the dendrogram and linkage modules.
    from scipy.cluster.hierarchy import dendrogram, linkage
    #Sk learn is one of the most widely used libraries for machine learning. We'll use the k means and pca modules.
    from sklearn.cluster import KMeans
    from sklearn.decomposition import PCA
    from sklearn.preprocessing import LabelEncoder, OrdinalEncoder 
    enc_i = OrdinalEncoder()
    doctors_transformed = enc_i.fit_transform(doctors)
    scaler = StandardScaler()
    segmentation_std = scaler.fit_transform(doctors_transformed)