Exploratory Data Analysis with R - Bank Marketing case
Bank Marketing
This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y
).
Source of dataset.
Management has asked to analysis following:
- 🗺️ Task 1 - Explore: What are the jobs of the people most likely to subscribe to a term deposit?
- 📊 Task 2 - Visualize: Create a plot to visualize the number of people subscribing to a term deposit by
month
. - 🔎 Task 3 - Analyze: What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?
Lastly, any additional information is welcomed and use of machine learning algorithms. The bank want to use more algorithms and management wants to know what is the best way to use them. Ideally, compare various machine learning tehcniques.
Background:
You work for a financial services firm. The past few campaigns have not gone as well as the firm would have hoped, and they are looking for ways to optimize their marketing efforts.
They have supplied you with data from a previous campaign and some additional metrics such as the consumer price index and consumer confidence index. They want to know whether you can predict the likelihood of subscribing to a term deposit. The manager would also like to know what factors are most likely to increase a customer's probability of subscribing.
You will need to prepare a report which include raw data, codes as well as used techniques. Later, the results will be cleaned for the nicer report to a broad audience.
Step 1. Import data
#Import dataset
library(tidyverse)
bank <- read_delim('data/bank-marketing.csv', delim=";", show_col_types = FALSE)
Data Dictionary
Column | Variable | Class |
---|---|---|
age | age of customer | |
job | type of job | categorical: "admin.","blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed","services","student","technician","unemployed","unknown" |
marital | marital status | categorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed |
education | highest degree of customer | categorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown" |
default | has credit in default? | categorical: "no","yes","unknown" |
housing | has housing loan? | categorical: "no","yes","unknown" |
loan | has personal loan? | categorical: "no","yes","unknown" |
contact | contact communication type | categorical: "cellular","telephone" |
month | last contact month of year | categorical: "jan", "feb", "mar", ..., "nov", "dec" |
day_of_week | last contact day of the week | categorical: "mon","tue","wed","thu","fri" |
campaign | number of contacts performed during this campaign and for this client | numeric, includes last contact |
pdays | number of days that passed by after the client was last contacted from a previous campaign | numeric; 999 means client was not previously contacted |
previous | number of contacts performed before this campaign and for this client | numeric |
poutcome | outcome of the previous marketing campaign | categorical: "failure","nonexistent","success" |
emp.var.rate | employment variation rate - quarterly indicator | numeric |
cons.price.idx | consumer price index - monthly indicator | numeric |
cons.conf.idx | consumer confidence index - monthly indicator | numeric |
euribor3m | euribor 3 month rate - daily indicator | numeric |
nr.employed | number of employees - quarterly indicator | numeric |
y | has the client subscribed a term deposit? | binary: "yes","no" |
Step 1. Explore and investigate data
head(bank)
str(bank)
summary(bank)
nrow(bank)
dim(bank)
Exploring task 1 - What are the jobs of the people most likely to subscribe to a term deposit?
# Filter the data where the client has subscribed a term deposit
subscribed_data <- bank %>% filter(y == "yes")
# Check the count of each job category
table(subscribed_data$job)
Visualizing task 2 - Create a plot to visualize the number of people subscribing to a term deposit by month
# Create the bar plot
ggplot(data = subscribed_data, aes(x = month)) +
geom_bar(fill = "steelblue") +
labs(x = "Month", y = "Number of Subscriptions", title = "Subscriptions by Month")
Analyzing task 3 - What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?