CRM Analytics - Customer Lifetime Value Prediction
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Customer Lifetime Value Prediction

    CLTV = Conditional Expected Number of Transaction * Conditional Expected Average Profit

    First, the whole customers' behaviours are applied to a model and then make prediction the expected transaction for each customer.

    CLTV = BG/NBD MODEL * GAMMA GAMMA SUBMODEL

    BG/NBD MODEL for expected number of transaction GAMMA GAMMA SUBMODEL for expected average profit

    BG/NBD MODEL : Beta Geometric / Negative Binomial Distribution

    Transaction Process(buy)

    • Possion distribution for the expected number of transaction and transaction rate
    • Gamma distribution in whole customers

    Dropout Process(till you die)

    • All customers have their dropout probability as p.
    • Beta distribution for dropout rates

    Importing Modules & Dataset

    pip install lifetimes
    Hidden output
    import datetime as dt
    import pandas as pd
    import matplotlib.pyplot as plt
    from lifetimes import BetaGeoFitter
    from lifetimes import GammaGammaFitter
    from lifetimes.plotting import plot_period_transactions
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 500)
    pd.set_option('display.float_format', lambda x: '%.4f' % x)
    from sklearn.preprocessing import MinMaxScaler
    data2010 = pd.read_excel("online_retail_II.xlsx", sheet_name="Year 2009-2010")
    data2011 = pd.read_excel("online_retail_II.xlsx", sheet_name="Year 2010-2011")
    data = data2010.append(data2011)
    data.reset_index(drop = True, inplace=True)
    df = data.copy()

    Exploratory Data Analysis

    Check Data

    def check(df, head = 5):
        print("**********************************HEAD**********************************")
        print(df.head(head))
        print("**********************************TAIL**********************************")
        print(df.tail(head))
        print("**********************************TYPES**********************************")
        print(df.dtypes)
        print("**********************************SHAPE**********************************")
        print(df.shape)
        print("**********************************NA**********************************")
        print(df.isnull().sum())
        print("**********************************QUANTILES**********************************")
        print(df.describe([0, 0.05, 0.5, 0.95]))
    check(df)

    Grab Columns

    def grab_col_names(dataframe, cat_th=10, car_th=20):
        """
    
        Veri setindeki kategorik, numerik ve kategorik fakat kardinal değişkenlerin isimlerini verir.
        Not: Kategorik değişkenlerin içerisine numerik görünümlü kategorik değişkenler de dahildir.
    
        Parameters
        ------
            dataframe: dataframe
                    Değişken isimleri alınmak istenilen dataframe
            cat_th: int, optional
                    numerik fakat kategorik olan değişkenler için sınıf eşik değeri
            car_th: int, optinal
                    kategorik fakat kardinal değişkenler için sınıf eşik değeri
    
        Returns
        ------
            cat_cols: list
                    Kategorik değişken listesi
            num_cols: list
                    Numerik değişken listesi
            cat_but_car: list
                    Kategorik görünümlü kardinal değişken listesi
    
        Examples
        ------
            import seaborn as sns
            df = sns.load_dataset("iris")
            print(grab_col_names(df))
    
    
        Notes
        ------
            cat_cols + num_cols + cat_but_car = toplam değişken sayısı
            num_but_cat cat_cols'un içerisinde.
            Return olan 3 liste toplamı toplam değişken sayısına eşittir: cat_cols + num_cols + cat_but_car = değişken sayısı
    
        """
    
        # cat_cols, cat_but_car
        cat_cols = [col for col in dataframe.columns if dataframe[col].dtypes == "O"]
        num_but_cat = [col for col in dataframe.columns if dataframe[col].nunique() < cat_th and
                       dataframe[col].dtypes != "O"]
        cat_but_car = [col for col in dataframe.columns if dataframe[col].nunique() > car_th and
                       dataframe[col].dtypes == "O"]
        cat_cols = cat_cols + num_but_cat
        cat_cols = [col for col in cat_cols if col not in cat_but_car]
    
        # num_cols
        num_cols = [col for col in dataframe.columns if dataframe[col].dtypes != "O"]
        num_cols = [col for col in num_cols if col not in num_but_cat]
    
        print(f"Observations: {dataframe.shape[0]}")
        print(f"Variables: {dataframe.shape[1]}")
        print(f'cat_cols: {len(cat_cols)}')
        print(f'num_cols: {len(num_cols)}')
        print(f'cat_but_car: {len(cat_but_car)}')
        print(f'num_but_cat: {len(num_but_cat)}')
        return cat_cols, num_cols, cat_but_car
    
    cats,nums,cards = grab_col_names(df)
    def num_summary(df, col, plot=False):
        quantiles = [0.01, 0.05, 0.1, 0.25, 0.50, 0.75, 0.95, 0.99]
        print(df[col].describe(quantiles).T)
        print("*****************************************************")
        if plot:
            df[col].hist()
            plt.xlabel(col)
            plt.title(col)
            plt.show(block=True)