This lesson requires a premium membership to access.
Premium membership includes unlimited access to all courses, quizzes, downloadable resources, and future content updates.
Ask questions about this lesson and get instant answers.
The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid.
1> 100*nrow(data_test %>% filter(loan_status=="Fully.Paid"))/nrow(data_test)
2[1] 70.16704
3>
4
5For our test data, we gain 70.3% accuracy by just following the above strategy. Recall that we are yet to include the outcome of 'Current' loans. In a real situation, the ratio of Fully Paid loans is usually much higher so accuracy metric is not our main concern here. We will instead focus on a trade-off in identifying a default loan as an expense of mislabelling some good loans. We will look at ROC curve and pay particular focus on AUC when we train our models.