Workspace
Tanya Blackburn/

Competition - predict hotel cancellation

0
Beta
Spinner

Predicting Hotel Cancellations

🏨 Background

You are supporting a hotel with a project aimed to increase revenue from their room bookings. They believe that they can use data science to help them reduce the number of cancellations. This is where you come in!

They have asked you to use any appropriate methodology to identify what contributes to whether a booking will be fulfilled or cancelled. They intend to use the results of your work to reduce the chance someone cancels their booking.

The Data

They have provided you with their bookings data in a file called hotel_bookings.csv, which contains the following:

ColumnDescription
Booking_IDUnique identifier of the booking.
no_of_adultsThe number of adults.
no_of_childrenThe number of children.
no_of_weekend_nightsNumber of weekend nights (Saturday or Sunday).
no_of_week_nightsNumber of week nights (Monday to Friday).
type_of_meal_planType of meal plan included in the booking.
required_car_parking_spaceWhether a car parking space is required.
room_type_reservedThe type of room reserved.
lead_timeNumber of days before the arrival date the booking was made.
arrival_yearYear of arrival.
arrival_monthMonth of arrival.
arrival_dateDate of the month for arrival.
market_segment_typeHow the booking was made.
repeated_guestWhether the guest has previously stayed at the hotel.
no_of_previous_cancellationsNumber of previous cancellations.
no_of_previous_bookings_not_canceledNumber of previous bookings that were canceled.
avg_price_per_roomAverage price per day of the booking.
no_of_special_requestsCount of special requests made as part of the booking.
booking_statusWhether the booking was cancelled or not.

Source (data has been modified): https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset

import pandas as pd
hotels = pd.read_csv("data/hotel_bookings.csv")
hotels

The Challenge

  • Use your skills to produce recommendations for the hotel on what factors affect whether customers cancel their booking.
Unknown integration
DataFrameavailable as
df
variable
SELECT booking_status, COUNT(*) AS total, total/36275*100 AS percent_of_all_bookings
FROM hotels
GROUP BY booking_status;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df3
variable
SELECT no_of_children, count(*) AS guests, guests/36275*100 AS percent_of_total
FROM hotels
GROUP BY no_of_children
ORDER BY percent_of_total DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

The majority of guests who made bookings at this hotel (91.73%) reported having zero children on their room reservation.

Unknown integration
DataFrameavailable as
df1
variable
SELECT booking_status, no_of_children, COUNT(*) AS total
FROM hotels
WHERE booking_status = 'Canceled'
GROUP BY booking_status, no_of_children
ORDER BY no_of_children DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df2
variable
SELECT type_of_meal_plan, count(*) AS total, total/36275*100 AS percent_of_all_bookings
FROM hotels
WHERE booking_status='Canceled'
GROUP BY type_of_meal_plan, booking_status
ORDER BY percent_of_all_bookings DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Guests who selected Meal Plan 1 were the most likely to cancel their reservation. 23.55% of guests who booked a reservation and selected Meal Plan 1 canceled their reservation.

  • AI Chat
  • Code