This lesson requires a premium membership to access.
Premium membership includes unlimited access to all courses, quizzes, downloadable resources, and future content updates.
The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid.
1> 100*nrow(data_test %>% filter(loan_status=="Fully.Paid"))/nrow(data_test)
2[1] 70.16704
3>
4
5For our test data, we gain 70.3% accuracy by just following the above strategy. Recall that we are yet to include the outcome of 'Current' loans. In a real situation, the ratio of Fully Paid loans is usually much higher so accuracy metric is not our main concern here. We will instead focus on a trade-off in identifying a default loan as an expense of mislabelling some good loans. We will look at ROC curve and pay particular focus on AUC when we train our models.