Competition - Improving Customer Segmentation - Diagnosing doctors

Diagnosing doctors

1 hidden cell

Hidden code

-- "There are four types of doctors:
1) Some can do nothing, but they know everything. These are therapists.
2) The second ones do not know anything, but they can do everything. These are surgeons.
3) The third ones do not know anything and do not know how. Psychiatrists.
4) And there are doctors who know everything and can do everything, but people get to them too late..."

-- "In psychiatry, after all, as - who first put on the robe, he is the doctor."

Recommendations.

1. Increase the amount of doctors in Clusters 0, 10, 22, 28 and 30 for rise the Purchases.
2. Avoid to have a deal with the Cluster 5.
3. Name the Clusters as you wish to labeling them for the intocompany slang.

#dependencies
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.lines import Line2D

from sklearn.preprocessing import LabelEncoder

from sklearn.cluster import KMeans
from sklearn.metrics import davies_bouldin_score
from sklearn.metrics import silhouette_score

💾 The data

The company stores the information in the following four tables. Some of the fields are anonymized to comply with privacy regulations.

Doctors contains information on doctors. Each row represents one doctor.

"DoctorID" - is a unique identifier for each doctor.
"Region" - the current geographical region of the doctor.
"Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
"Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
"Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
"Satisfaction" - measures doctors' satisfaction with the company.
"Experience" - relates to the doctor's experience with the company.
"Purchases" - purchases over the last year.

Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.

"DoctorID" - doctor id (matches the other tables).
"OrderID" - order identifier.
"OrderNum" - order number.
"Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.

Complaints collects information on doctor complaints.

"DoctorID" - doctor id (matches the other tables).
"Complaint Type" - the company's classification of the complaints.
"Qty" - number of complaints per complaint type per doctor.

Instructions has information on whether the doctor includes special instructions on their orders.

"DoctorID" - doctor id (matches the other tables).
"Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.

doctors = pd.read_csv('data/doctors.csv')
display(doctors.head(), doctors.shape)

orders = pd.read_csv('data/orders.csv')
display(orders.head(), orders.shape)

complaints = pd.read_csv('data/complaints.csv')
display(complaints.head(), complaints.shape)

instructions = pd.read_csv('data/instructions.csv')
display(instructions.head(), instructions.shape)

0. Exploratory Data Analysis.

Looks on what we have.

#data info and missing data detection
print(doctors.isna().sum(), doctors.info())
print(orders.isna().sum(), orders.info())
print(complaints.isna().sum(), complaints.info())
print(instructions.isna().sum(), instructions.info())

1. How many doctors are there in each region? What is the average number of purchases per region?

‌
‌
‌