Explore Loan Data in R - Loan Grade and Interest Rate

There is no set path to how one would go about analyzing a data set. Typically, a data scientist would spend quite some time exploring and observing the data to understand it well.

Let’s look at some of the attributes in our dataset and see their relationship with the default rate. For each loan, we have two basic attributes, namely, grade and the interest rate. In LendingClub, loan grade, or credit grade, is the letter (A-G or AA-HR) that is assigned to a borrower and corresponds with the interest rate that is charged for the loan.

These rates are set based on the originator’s underwriting assessment for the individual borrower. The higher the expected risk of default, the higher the interest rate is set in order to offset this risk. The assessment of the credit grade decision includes FICO, loan term (a shorter term is considered better), proprietary models, and the loan amount requested by the borrower.

We also have the interest rate being charged for each loan.

Default Rate for Each Loan Grade

Let’s create a plot where we will plot the default rate for each grade.

The following command will give us the number of defaults for each grade.

> g1 = loandata %>% filter(loan_status == "Default") %>% group_by(grade) %>% summarise(default_count = n())
> g1
# A tibble: 7 x 2
  grade default_count
  <chr>         <int>
1 A               975
2 B              3487
3 C              5534
4 D              3697
5 E              2704
6 F              1260
7 G               379
>

We can now calculate the default rate in each grade as follows:

> g2 = loandata %>% group_by(grade) %>% summarise(count = n())
> g3 <- g2 %>% left_join(g1) %>% mutate(default_rate = 100*default_count/count) %>% select(grade,count,default_count,default_rate)
Joining, by = "grade"
> g3
# A tibble: 7 x 4
  grade count default_count default_rate
  <chr> <int>         <int>        <dbl>
1 A      9956           975         9.79
2 B     16649          3487        20.9 
3 C     16815          5534        32.9 
4 D      8480          3697        43.6 
5 E      5261          2704        51.4 
6 F      2109          1260        59.7 
7 G       599           379        63.3 
>

We can now plot this using ggplot2.

> ggplot(g3, aes(x=grade, y=default_rate, fill=grade)) + geom_bar(stat="identity")

As we would expect, riskier loans have higher default rates.

Loan Grade Vs Interest Rate

We can also plot the relationship between Loan Grade and Interest Rate.

To do so, we will first convert the interest rate attribute to numeric.

> loandata$int_rate = (as.numeric(gsub(pattern = "%",replacement = "",x = loandata$int_rate)))

We can now group the data by grade and their mean interest rates:

> x1 = loandata %>% filter(loan_status == "Default") %>% group_by(grade) %>% summarise(int_rate=mean(int_rate))
> ggplot(x1, aes(x=grade, y=int_rate, fill=grade)) + geom_bar(stat="identity",position="dodge")

As we would expect, riskier loans (higher grades) have higher interest rates.

Related Downloads

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $29 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.