Discover factors on hotel booking cancellation and provide recommendations to increase revenue.
They have provided you with their bookings data in a file called
hotel_bookings.csv, which contains the following:
|Unique identifier of the booking.|
|The number of adults.|
|The number of children.|
|Number of weekend nights (Saturday or Sunday).|
|Number of week nights (Monday to Friday).|
|Type of meal plan included in the booking.|
|Whether a car parking space is required.|
|The type of room reserved.|
|Number of days before the arrival date the booking was made.|
|Year of arrival.|
|Month of arrival.|
|Date of the month for arrival.|
|How the booking was made.|
|Whether the guest has previously stayed at the hotel.|
|Number of previous cancellations.|
|Number of previous bookings that were canceled.|
|Average price per day of the booking.|
|Count of special requests made as part of the booking.|
|Whether the booking was cancelled or not.|
Source (data has been modified): https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset
Visualization of data is done by tableau after cleaning data with python. Click here to view interactive dashboards. Start from identy target to cancel weekday.
The hotel has a 32.74% cancellation rate on its entire customer population (32780), which causes 1.18M financial loss.
For customers who canceled hotel bookings, most of them are non-repeated customers. Customers with children have a minor impact since it only dominates 7.53% in this group disregards the number of children. Customer with size 2 without children has a significant impact on hotel booking cancellation, which rules 66.88% and makes 63.87% revenue loss.
As the number of children and adults increases, expecting the average room price to be more expensive. However, there is an outlier in the cancel booking group. The customer with size 2 has 9 children, the average room price is only $76.5.
The given data is from 2017 and 2018. For data in 2017, it is only accessible begins from September up to December. Evaluating average room price impacts on customer size in cancellation on each month from September to December in both years, the average room price reaches its peak in September. The average room price starts to decline in the following months. However, when the average room price is at its maximum point, the customer size reaches its peak in October. This is an interesting fact that price drops and demand decreases. This is likely due to the coming holiday season, as customers wait for better prices.
For customers with two people without children, they prefer room type 1 the most, followed by room type 4 and room type 2 respectively. The most favorite meal plans are meal plan 1, then meal plan 2. Interestingly, customers in this group cancel their bookings most likely happens on Thursday and Wednesday based on the data in 2017 and 2018 respectively.
Conclusion and Reconmendation
To conclude, the hotel should focus on the customer with size two without kids since it is the main portion in canceling hotel bookings. Customers are sensitive to the average room price due to the coming holiday season. Therefore, the adjusted monthly average room price should consider the impact of the holiday season. Changing average room price with a small margin monthly such that the change of price margin matches the discount rate during the holiday season. Additionally, based on the pattern of canceling hotel bookings, it is better to provide promotions on Thursdays and Wednesdays to encourage customers to stay with the hotel. For example, special days in a week with a bundle(room type 4 and meal plan 1) to attract more customers or low down customer churn rate.
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pylab as plt
hotels = pd.read_csv("data/hotel_bookings.csv") hotels.head(10)
# get the portion of missing values in each columns missing_por = np.round(hotels.isnull().mean()*100,2) missing_por
The main challenging for missing value in this dataset is to fill missing values in the timeframe(no_of_weekend_nights, no_of_week_nights ,lead_time, arrival_year, arrival_month and arrival_date) and price(avg_price_per_room). These factors on the timeframe has significantly impact on analysis to understand the behavior of consumers booking hotels.
Since the portion of missing values in no_of_weekend_nights(1.01%) ,no_of_week_nights(2.22%) ,lead_time(1.30%), arrival_year(1.04%), arrival_month(1.394%),arrival_date(2.70%) and avg_price_per_room(1.27%) are less than 3% percent on the entire given dataset, we will drop the missing values in these columns
# dropping 'lead_time','arrival_year','arrival_month', 'arrival_date', 'avg_price_per_room' hotel_update = hotels.dropna(subset=['no_of_weekend_nights','no_of_week_nights','lead_time','arrival_year','arrival_month', 'arrival_date', 'avg_price_per_room']) hotel_update.shape/hotels.shape
#no_of_adults fill no_of_adults with most frequent number hotel_update['no_of_adults'] = hotel_update['no_of_adults'].fillna(hotel_update['no_of_adults'].mode().loc) # check if there is invalide value in no_of_adults which is less than 1 print("Number of adult less than 1?",(hotel_update['no_of_adults'] < 1).any())
# update the value in most frequent number mask_adult = hotel_update['no_of_adults'] < 1 hotel_update.loc[mask_adult,'no_of_adults'] = hotel_update['no_of_adults'].mode().loc
# fill missing value with the most frequent number hotel_update['no_of_children'] = hotel_update['no_of_children'].fillna(hotel_update['no_of_children'].mode().loc)
# fill required_car_parking_space with most frequent number hotel_update['required_car_parking_space'] = hotel_update['required_car_parking_space'].fillna(hotel_update['required_car_parking_space'].mode().loc)
# repeated_guest depends has two types for 0, it does not have no_of_previous_cancellations neither no_of_previous_bookings_not_canceled cancel_vars = ['no_of_previous_cancellations','no_of_previous_bookings_not_canceled'] mask_rg = hotel_update['repeated_guest'] == 0 #has missing values? print("Missing values: ", hotel_update[mask_rg][cancel_vars].isnull().any(), sep='\n') #set 0 on no_of_previous_cancellations & no_of_previous_cancellations when repeated_guest is 0 hotel_update.loc[mask_rg,cancel_vars] = 0 # fill repeated_guest is null as 0 when no_of_previous_cancellations & no_of_previous_cancellations are 0 mask_0 = (hotel_update['no_of_previous_cancellations'] == 0) & (hotel_update['no_of_previous_bookings_not_canceled'] ==0) &(hotel_update['repeated_guest'].isnull()) hotel_update.loc[mask_0,'repeated_guest'] = 0 print("Missing values: ", hotel_update[mask_rg][cancel_vars].isnull().any(), sep='\n')
#following the same steps for repeat_guest = 1 msk_rg = hotel_update['repeated_guest'] == 1 msk_1 = (hotel_update['no_of_previous_cancellations'].isnull()) | (hotel_update['no_of_previous_bookings_not_canceled'].isnull()) #has missing values? print("Missing values: ", hotel_update[msk_rg][cancel_vars].isnull().any(), sep='\n')
# update missing values hotel_update.loc[(msk_rg&msk_1), 'no_of_previous_cancellations'] = 0 hotel_update.loc[(msk_rg&msk_1), 'no_of_previous_bookings_not_canceled'] = 1 print("Missing values: ", hotel_update[msk_rg][cancel_vars].isnull().any(), sep='\n')
# dropping repeat_guest OR no_of_previous_cancellations is null WHEN repeat_guest is null mask_null = hotel_update['repeated_guest'].isnull() mask_nulls = (mask_null & hotel_update['no_of_previous_cancellations'].isnull()) | (mask_null & hotel_update['no_of_previous_bookings_not_canceled'].isnull()) hotel_update.drop(hotel_update[mask_nulls].index, inplace=True) #fill the missing value in repeat_guess when repeat_guest OR no_of_previous_cancellations is not zero. hotel_update.loc[mask_null,'repeated_guest'] = 1
# set missing values to be zero mask_rqst = hotel_update['no_of_special_requests'].isnull() hotel_update.loc[mask_rqst, 'no_of_special_requests']= 0
# fill the missing value of type_of_meal_plan, room_type_reserved, market_segment_type with most frequent value hotel_update['type_of_meal_plan'] = hotel_update['type_of_meal_plan'].fillna(hotel_update['type_of_meal_plan'].mode()) hotel_update['room_type_reserved'] = hotel_update['room_type_reserved'].fillna(hotel_update['room_type_reserved'].mode()) hotel_update['market_segment_type'] = hotel_update['market_segment_type'].fillna(hotel_update['market_segment_type'].mode())