Skip to content
Competition - Loan Data
  • AI Chat
  • Code
  • Report
  • Spinner

    Loan Data

    Ready to put your coding skills to the test? Join us for our Workspace Competition!
    For more information, visit


    This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.

    Load packages


    Load your Data

    loans <- readr::read_csv('data/loans.csv.gz',show_col_types = FALSE)
    # skim(loans) %>% 
      # #select(-(numeric.p0:numeric.p100)) %>%
      # select(-(complete_rate))
    As stated above in data summary , there is no na in the datset, so we can proceed

    let's analyze data

    loans_grouped_credit_purpose <- loans %>% 
      group_by(credit_policy, purpose) %>% 
    loans_grouped_credit_purpose$credit_policy <- factor(loans$credit_policy,
                                   levels =c(0,1),
                                  # 0 : default : doesn't meet the criteria 
                                  # 1 : meet_criteria : meet criteria to take a loan 
                                  labels = c("default","meet_criteria")) 
    # check
    prop.table(table(loans_grouped_credit_purpose$credit_policy)) %>% round(2)

    The proportion of candidate that met credit policy criteria is 0.8.

    what is the purpose with the highest proportion among borrowers ?

    loans_grouped_credit_purpose %>% 
      summarize(count = n(),
                prop = round(count/nrow(loans_grouped_credit_purpose),2)) %>% 
      arrange(desc(prop)) -> df
    df %>% 
      ggplot( aes(purpose, fill = credit_policy) ) +
      geom_col(aes(y = prop),position ="dodge") +
      labs(title ="credit purpose by credit policy") + 
      theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))

    Overall, the purpose with the highest proportion is debt consolidation.

    among those who underwrite credit policy, what are the 3 top loan purpose ?

    # numerical summary
    top_3_loans <- head(df,3)
    # loans_meet_criteria_top3
    loans_meet_criteria_top3 <- loans %>%  
      filter(credit_policy == 1,purpose %in% top_3_loans$purpose)  
    # plot
    loans_meet_criteria_top3 %>% 
      #filter(purpose %in% c(top_3_loans$purpose)) %>% 
      ggplot(aes(purpose)) +
      geom_bar() +
      labs(title =" top 3 loans purpose among those who meet credit policy criteria") + 
      theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))

    debt consolidation is the most common purpose, with a proportion of r round(top_3_loans$prop[1],2) among those who meet credit policy criteria/ underwrite credit.

    For those who meet criteria, what is the annual average income given a purpose?

    loans_grouped_credit_purpose %>% 
      filter(credit_policy =="meet_criteria") %>% 
      summarize (avg_annual_inc = exp(mean(log_annual_inc))) %>% 
      arrange(desc(avg_annual_inc)) %>% 
      top_n(n = 3)

    Candidates with high annual income tend to have home improvement, small business and credit card as a loan purpose.

    Is there any association between interest rate and top 3 loans purposes ?