EDA with R - Bank Marketing
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Exploratory Data Analysis with R - Bank Marketing case

    Bank Marketing

    This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y).

    Source of dataset.

    Management has asked to analysis following:

    • 🗺️ Task 1 - Explore: What are the jobs of the people most likely to subscribe to a term deposit?
    • 📊 Task 2 - Visualize: Create a plot to visualize the number of people subscribing to a term deposit by month.
    • 🔎 Task 3 - Analyze: What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?

    Lastly, any additional information is welcomed and use of machine learning algorithms. The bank want to use more algorithms and management wants to know what is the best way to use them. Ideally, compare various machine learning tehcniques.

    Background:

    You work for a financial services firm. The past few campaigns have not gone as well as the firm would have hoped, and they are looking for ways to optimize their marketing efforts.

    They have supplied you with data from a previous campaign and some additional metrics such as the consumer price index and consumer confidence index. They want to know whether you can predict the likelihood of subscribing to a term deposit. The manager would also like to know what factors are most likely to increase a customer's probability of subscribing.

    You will need to prepare a report which include raw data, codes as well as used techniques. Later, the results will be cleaned for the nicer report to a broad audience.

    Step 1. Import data

    #Import dataset
    
    library(tidyverse)
    
    bank <- read_delim('data/bank-marketing.csv', delim=";", show_col_types = FALSE)

    Data Dictionary

    ColumnVariableClass
    ageage of customer
    jobtype of jobcategorical: "admin.","blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed","services","student","technician","unemployed","unknown"
    maritalmarital statuscategorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed
    educationhighest degree of customercategorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown"
    defaulthas credit in default?categorical: "no","yes","unknown"
    housinghas housing loan?categorical: "no","yes","unknown"
    loanhas personal loan?categorical: "no","yes","unknown"
    contactcontact communication typecategorical: "cellular","telephone"
    monthlast contact month of yearcategorical: "jan", "feb", "mar", ..., "nov", "dec"
    day_of_weeklast contact day of the weekcategorical: "mon","tue","wed","thu","fri"
    campaignnumber of contacts performed during this campaign and for this clientnumeric, includes last contact
    pdaysnumber of days that passed by after the client was last contacted from a previous campaignnumeric; 999 means client was not previously contacted
    previousnumber of contacts performed before this campaign and for this clientnumeric
    poutcomeoutcome of the previous marketing campaigncategorical: "failure","nonexistent","success"
    emp.var.rateemployment variation rate - quarterly indicatornumeric
    cons.price.idxconsumer price index - monthly indicatornumeric
    cons.conf.idxconsumer confidence index - monthly indicatornumeric
    euribor3meuribor 3 month rate - daily indicatornumeric
    nr.employednumber of employees - quarterly indicatornumeric
    yhas the client subscribed a term deposit?binary: "yes","no"

    Step 1. Explore and investigate data

    head(bank)
    str(bank)
    summary(bank)
    nrow(bank)
    dim(bank)

    Exploring task 1 - What are the jobs of the people most likely to subscribe to a term deposit?

    # Filter the data where the client has subscribed a term deposit
    subscribed_data <- bank %>% filter(y == "yes")
    
    # Check the count of each job category
    table(subscribed_data$job)

    Visualizing task 2 - Create a plot to visualize the number of people subscribing to a term deposit by month

    # Create the bar plot
    ggplot(data = subscribed_data, aes(x = month)) +
      geom_bar(fill = "steelblue") +
      labs(x = "Month", y = "Number of Subscriptions", title = "Subscriptions by Month")

    Analyzing task 3 - What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?