Competition - prevent hotel cancellation
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Prevent Hotels Cancellations and No-Shows

    The Report is divided into following sections:

    1. Introduction
    2. Executive Summary
    3. Data Understanding
    • Checking the shape of the dataframe
    • Generating an overview of the dataframe
    • Checking the data for null values
    • Checking if the rows are duplicated in the dataframe
    1. Data Cleaning
    2. EDA
    • What factors affect whether customers cancel their booking?
    • Are cancellations more likely during weekends?
    1. Missing Value Imputation
    2. Model Building
    3. Comparing the models
    4. Recommendations

    Introduction

    In the hospitality industry, one of the most common challenges faced by hotels is the high rate of cancellations from their guests. This can be a significant source of revenue loss for hotels and can also impact their reputation among customers. To address this issue, hotels are now turning towards data science to help them identify the factors that contribute to cancellations and to find ways to reduce them. In this scenario, you have been tasked with supporting a hotel with a project aimed at increasing revenue from their room bookings by reducing cancellations. The goal of this project is to use data science to identify what factors contribute to whether a booking will be fulfilled or cancelled, and to provide actionable insights that the hotel can use to reduce the chance that someone cancels their booking. This project requires a comprehensive approach that involves analyzing a variety of data sources, identifying patterns and trends, and using appropriate statistical and machine learning techniques to uncover the underlying factors that contribute to cancellations.

    Executive Summary

    We are supporting a hotel with a project aimed to increase revenue from their room bookings. They believe that they can use data science to help them reduce the number of cancellations. This is where we come in! They have asked us to use any appropriate methodology to identify what contributes to whether a booking will be fulfilled or cancelled. They intend to use the results of our work to reduce the chance someone cancels their booking.

    To achieve these objectives, we have prepared a report covering the following:

    1. What factors affect whether customers cancel their booking?
    2. Are cancellations more likely during weekends?
    3. Which general recommendations for the hotel can you make?

    The results:

    The report begins with a brief overview and cleaning of the data, followed by an explanation of the methodologies employed to extract the most valuable insights. The exploratory data analysis has indicated that:

    1. The factors affecting whether customers cancel their bookings are:
    • Hotel reservations with zero children have the lowest cancellation rate, while reservations with two children have the highest cancellation rate.
    • The cancellation percentage is positively correlated with the number of weekend nights.
    • The cancellation rate is lowest when the number of weeknights booked in the hotel is one.
    • The cancellation rate is lowest in January and highest in July.
    • The median lead time value for the non-cancelled booking is much lower than the median value for the cancelled booking.
    1. Cancellations are more likely during weekends.
      • Weekday bookings have a 32% cancellation rate, while weekend bookings have a 35% cancellation rate.
      • The chi-square test of independence is used to determine if booking status is affected by weekend reservations. There is a statistically significant relationship between booking status and weekend reservations.

    Data Understanding

    1. Checking the shape of the dataframe
    2. Generate an overview of the dataframe
    3. Checking the data for null values
    4. Check if the rows are duplicated in the dataframe
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    import missingno as msno
    
    plt.style.use('ggplot')
    hotels = pd.read_csv("data/hotel_bookings.csv")
    hotels.head()
    hotels.shape
    
    hotels.info()
    hotels.isnull().sum()
    Validing that null entries in the row shouldn't be greater than 80%
    hotels[hotels.isnull().sum(axis=1)>=17]
    hotels= hotels.drop(hotels[hotels.isnull().sum(axis=1)>=17].index)
    hotels.shape

    We have dropped the first row from the dataframe as 17 out of 19 columns have missing values.

    msno.matrix(hotels)
    plt.show()