Building Credit Risk Model
The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid
.
1> 100*nrow(data_test %>% filter(loan_status=="Fully.Paid"))/nrow(data_test)
2[1] 70.16704
3>
4
5
For our test data, we gain 70.3% accuracy by just following the above strategy. Recall that we are yet to include the outcome of 'Current' loans. In a real situation, the ratio of Fully Paid loans is usually much higher so accuracy metric is not our main concern here. We will instead focus on a trade-off in identifying a default loan as an expense of mislabelling some good loans. We will look at ROC curve and pay particular focus on AUC when we train our models.
Unlock Premium Content
Upgrade your account to access the full article, downloads, and exercises.
You'll get access to:
- Access complete tutorials and examples
- Download source code and resources
- Follow along with practical exercises
- Get in-depth explanations