Skip to content
Competition - Loan Data
  • AI Chat
  • Code
  • Report
  • Spinner

    Loan Data

    Ready to put your coding skills to the test? Join us for our Workspace Competition!
    For more information, visit datacamp.com/workspacecompetition

    Context

    This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from lendingclub.com which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.

    Load packages

    library(skimr)
    library(tidyverse)

    Load your Data

    loans <- readr::read_csv('data/loans.csv.gz',show_col_types = FALSE)
    # skim(loans) %>% 
      # #select(-(numeric.p0:numeric.p100)) %>%
      # select(-(complete_rate))
    As stated above in data summary , there is no na in the datset, so we can proceed

    let's analyze data

    loans_grouped_credit_purpose <- loans %>% 
      group_by(credit_policy, purpose) %>% 
      arrange(desc(log_annual_inc))
    
    loans_grouped_credit_purpose$credit_policy <- factor(loans$credit_policy,
                                   levels =c(0,1),
                                  # 0 : default : doesn't meet the criteria 
                                  # 1 : meet_criteria : meet criteria to take a loan 
                                  labels = c("default","meet_criteria")) 
    
    # check
    prop.table(table(loans_grouped_credit_purpose$credit_policy)) %>% round(2)

    The proportion of candidate that met credit policy criteria is 0.8.

    what is the purpose with the highest proportion among borrowers ?

    loans_grouped_credit_purpose %>% 
      summarize(count = n(),
                prop = round(count/nrow(loans_grouped_credit_purpose),2)) %>% 
      arrange(desc(prop)) -> df
    
    
    df %>% 
      ggplot( aes(purpose, fill = credit_policy) ) +
      geom_col(aes(y = prop),position ="dodge") +
      labs(title ="credit purpose by credit policy") + 
      theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))

    Overall, the purpose with the highest proportion is debt consolidation.

    among those who underwrite credit policy, what are the 3 top loan purpose ?

    # numerical summary
    top_3_loans <- head(df,3)
    
    # loans_meet_criteria_top3
    loans_meet_criteria_top3 <- loans %>%  
      filter(credit_policy == 1,purpose %in% top_3_loans$purpose)  
      
    
    # plot
    loans_meet_criteria_top3 %>% 
      #filter(purpose %in% c(top_3_loans$purpose)) %>% 
      ggplot(aes(purpose)) +
      geom_bar() +
      labs(title =" top 3 loans purpose among those who meet credit policy criteria") + 
      theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))

    debt consolidation is the most common purpose, with a proportion of r round(top_3_loans$prop[1],2) among those who meet credit policy criteria/ underwrite credit.

    For those who meet criteria, what is the annual average income given a purpose?

    loans_grouped_credit_purpose %>% 
      filter(credit_policy =="meet_criteria") %>% 
      summarize (avg_annual_inc = exp(mean(log_annual_inc))) %>% 
      arrange(desc(avg_annual_inc)) %>% 
      top_n(n = 3)

    Candidates with high annual income tend to have home improvement, small business and credit card as a loan purpose.

    Is there any association between interest rate and top 3 loans purposes ?