Loan Data
Ready to put your coding skills to the test? Join us for our Workspace Competition!
For more information, visit datacamp.com/workspacecompetition
Context
This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from lendingclub.com which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.
Load packages
library(skimr)
library(tidyverse)
Load your Data
loans <- readr::read_csv('data/loans.csv.gz',show_col_types = FALSE)
# skim(loans) %>%
# #select(-(numeric.p0:numeric.p100)) %>%
# select(-(complete_rate))
As stated above in data summary , there is no na in the datset, so we can proceed
let's analyze data
loans_grouped_credit_purpose <- loans %>%
group_by(credit_policy, purpose) %>%
arrange(desc(log_annual_inc))
loans_grouped_credit_purpose$credit_policy <- factor(loans$credit_policy,
levels =c(0,1),
# 0 : default : doesn't meet the criteria
# 1 : meet_criteria : meet criteria to take a loan
labels = c("default","meet_criteria"))
# check
prop.table(table(loans_grouped_credit_purpose$credit_policy)) %>% round(2)
The proportion of candidate that met credit policy criteria is 0.8.
what is the purpose with the highest proportion among borrowers ?
loans_grouped_credit_purpose %>%
summarize(count = n(),
prop = round(count/nrow(loans_grouped_credit_purpose),2)) %>%
arrange(desc(prop)) -> df
df %>%
ggplot( aes(purpose, fill = credit_policy) ) +
geom_col(aes(y = prop),position ="dodge") +
labs(title ="credit purpose by credit policy") +
theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))
Overall, the purpose with the highest proportion is debt consolidation.
among those who underwrite credit policy, what are the 3 top loan purpose ?
# numerical summary
top_3_loans <- head(df,3)
# loans_meet_criteria_top3
loans_meet_criteria_top3 <- loans %>%
filter(credit_policy == 1,purpose %in% top_3_loans$purpose)
# plot
loans_meet_criteria_top3 %>%
#filter(purpose %in% c(top_3_loans$purpose)) %>%
ggplot(aes(purpose)) +
geom_bar() +
labs(title =" top 3 loans purpose among those who meet credit policy criteria") +
theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))
debt consolidation is the most common purpose, with a proportion
of r round(top_3_loans$prop[1],2)
among those who meet credit policy criteria/
underwrite credit.
For those who meet criteria, what is the annual average income given a purpose?
loans_grouped_credit_purpose %>%
filter(credit_policy =="meet_criteria") %>%
summarize (avg_annual_inc = exp(mean(log_annual_inc))) %>%
arrange(desc(avg_annual_inc)) %>%
top_n(n = 3)
Candidates with high annual income tend to have home improvement, small business and credit card as a loan purpose.
Is there any association between interest rate and top 3 loans purposes ?