Skip to content
Nearest Neighbors and Feature Engineering
(Invalid URL)
Loan Data
Ready to put your coding skills to the test? Join us for our Workspace Competition!
For more information, visit datacamp.com/workspacecompetition
Context
This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from lendingclub.com which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.
Load packages
library(skimr)
library(tidyverse)
Load your Data
loans <- readr::read_csv('data/loans.csv.gz')
skim(loans) %>%
select(-(numeric.p0:numeric.p100)) %>%
select(-(complete_rate))
Understand your data
Variable | class | description |
---|---|---|
credit_policy | numeric | 1 if the customer meets the credit underwriting criteria; 0 otherwise. |
purpose | character | The purpose of the loan. |
int_rate | numeric | The interest rate of the loan (more risky borrowers are assigned higher interest rates). |
installment | numeric | The monthly installments owed by the borrower if the loan is funded. |
log_annual_inc | numeric | The natural log of the self-reported annual income of the borrower. |
dti | numeric | The debt-to-income ratio of the borrower (amount of debt divided by annual income). |
fico | numeric | The FICO credit score of the borrower. |
days_with_cr_line | numeric | The number of days the borrower has had a credit line. |
revol_bal | numeric | The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle). |
revol_util | numeric | The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available). |
inq_last_6mths | numeric | The borrower's number of inquiries by creditors in the last 6 months. |
delinq_2yrs | numeric | The number of times the borrower had been 30+ days past due on a payment in the past 2 years. |
pub_rec | numeric | The borrower's number of derogatory public records. |
not_fully_paid | numeric | 1 if the loan is not fully paid; 0 otherwise. |
Now you can start to explore this dataset with the chance to win incredible prices! Can't think of where to start? Try your hand at these suggestions:
- Extract useful insights and visualize them in the most interesting way possible.
- Find out how long it takes for users to pay back their loan.
- Build a model that can predict the probability a user will be able to pay back their loan within a certain period.
- Find out what kind of people take a loan for what purposes.
Judging Criteria
CATEGORY | WEIGHTAGE | DETAILS |
---|---|---|
Analysis | 30% |
|
Results | 30% |
|
Creativity | 40% |
|