# Live Training - Feature Engineering for Predicting Hotel Bookings with tidymodels [DO NOT DELETE]

## Required packages

```
library(tidyverse)
library(tidymodels)
library(lubridate)
library(devtools)
install.packages("naniar")
library(naniar)
options(warn = -1)
```

## A silly but informative exercise

The `height`

data set contains observations of the height of an object at several points in time. Let's build a simple linear model to predict height.

- Build a linear model using the base R
`lm`

function. - Bind a prediction column to the height data frame.
- Graph the data and predictions in one chart.

```
height <- read_csv("height.csv")
# Build a linear model using the base R lm function
# Bind a prediction column to the height data frame
# Graph the data and the predictions in one chart
```

This result is definitely bad. However, from the shape of the data, we can infer a quadratic behavior. So let's take a shot at this idea and add a

- Build a model using the
`lm()`

function to predict height in terms of and . - Bind a prediction column to the height data frame.
- Graph the data and predictions in one chart.

```
# Build a model using the lm() function to predict height in terms of time and time^2
# Bind a prediction column to the height data frame
# Graph the data and predictions in one chart
```

## The tidymodels framework

```
cancelations <- read_csv("cancelations_live.csv")
cancelations
```

Let's take a look at our features

`names(cancelations)`

`StaysInWeekNights`

, for example, is an informative feature as it distinguishes those days from those on weekends, while `arrival_date`

is less so, as it doesn't mean much to the model other than a series of values. We can make it informative by creating new features from it:

This way, the model can distinguish a day like "Friday" or a month like "December," which might be meaningful to our modeling problem.

For today's exploration, we will stick to the `tidymodels`

framework—a collection of modeling and machine learning packages using tidyverse principles, emphasizing feature engineering.

### Setting up our data for analysis

- Transform strings into factors.
- Split into train and test sets stratifying by our target variable.
- Verify that both sets have similar proportions of the target variable.

```
# Transform strings into factors
# Split into train and test sets stratifying by our target variable
set.seed(123)
# Verify that both sets have similar proportions of the target
```