Beta
Customer Lifetime Value Prediction
CLTV = Conditional Expected Number of Transaction * Conditional Expected Average Profit
First, the whole customers' behaviours are applied to a model and then make prediction the expected transaction for each customer.
CLTV = BG/NBD MODEL * GAMMA GAMMA SUBMODEL
BG/NBD MODEL for expected number of transaction GAMMA GAMMA SUBMODEL for expected average profit
BG/NBD MODEL : Beta Geometric / Negative Binomial Distribution
Transaction Process(buy)
- Possion distribution for the expected number of transaction and transaction rate
- Gamma distribution in whole customers
Dropout Process(till you die)
- All customers have their dropout probability as p.
- Beta distribution for dropout rates
Importing Modules & Dataset
pip install lifetimes
Hidden output
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from lifetimes.plotting import plot_period_transactions
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 500)
pd.set_option('display.float_format', lambda x: '%.4f' % x)
from sklearn.preprocessing import MinMaxScaler
data2010 = pd.read_excel("online_retail_II.xlsx", sheet_name="Year 2009-2010")
data2011 = pd.read_excel("online_retail_II.xlsx", sheet_name="Year 2010-2011")
data = data2010.append(data2011)
data.reset_index(drop = True, inplace=True)
df = data.copy()
Exploratory Data Analysis
Check Data
def check(df, head = 5):
print("**********************************HEAD**********************************")
print(df.head(head))
print("**********************************TAIL**********************************")
print(df.tail(head))
print("**********************************TYPES**********************************")
print(df.dtypes)
print("**********************************SHAPE**********************************")
print(df.shape)
print("**********************************NA**********************************")
print(df.isnull().sum())
print("**********************************QUANTILES**********************************")
print(df.describe([0, 0.05, 0.5, 0.95]))
check(df)
Grab Columns
def grab_col_names(dataframe, cat_th=10, car_th=20):
"""
Veri setindeki kategorik, numerik ve kategorik fakat kardinal değişkenlerin isimlerini verir.
Not: Kategorik değişkenlerin içerisine numerik görünümlü kategorik değişkenler de dahildir.
Parameters
------
dataframe: dataframe
Değişken isimleri alınmak istenilen dataframe
cat_th: int, optional
numerik fakat kategorik olan değişkenler için sınıf eşik değeri
car_th: int, optinal
kategorik fakat kardinal değişkenler için sınıf eşik değeri
Returns
------
cat_cols: list
Kategorik değişken listesi
num_cols: list
Numerik değişken listesi
cat_but_car: list
Kategorik görünümlü kardinal değişken listesi
Examples
------
import seaborn as sns
df = sns.load_dataset("iris")
print(grab_col_names(df))
Notes
------
cat_cols + num_cols + cat_but_car = toplam değişken sayısı
num_but_cat cat_cols'un içerisinde.
Return olan 3 liste toplamı toplam değişken sayısına eşittir: cat_cols + num_cols + cat_but_car = değişken sayısı
"""
# cat_cols, cat_but_car
cat_cols = [col for col in dataframe.columns if dataframe[col].dtypes == "O"]
num_but_cat = [col for col in dataframe.columns if dataframe[col].nunique() < cat_th and
dataframe[col].dtypes != "O"]
cat_but_car = [col for col in dataframe.columns if dataframe[col].nunique() > car_th and
dataframe[col].dtypes == "O"]
cat_cols = cat_cols + num_but_cat
cat_cols = [col for col in cat_cols if col not in cat_but_car]
# num_cols
num_cols = [col for col in dataframe.columns if dataframe[col].dtypes != "O"]
num_cols = [col for col in num_cols if col not in num_but_cat]
print(f"Observations: {dataframe.shape[0]}")
print(f"Variables: {dataframe.shape[1]}")
print(f'cat_cols: {len(cat_cols)}')
print(f'num_cols: {len(num_cols)}')
print(f'cat_but_car: {len(cat_but_car)}')
print(f'num_but_cat: {len(num_but_cat)}')
return cat_cols, num_cols, cat_but_car
cats,nums,cards = grab_col_names(df)
def num_summary(df, col, plot=False):
quantiles = [0.01, 0.05, 0.1, 0.25, 0.50, 0.75, 0.95, 0.99]
print(df[col].describe(quantiles).T)
print("*****************************************************")
if plot:
df[col].hist()
plt.xlabel(col)
plt.title(col)
plt.show(block=True)