• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High quality tutorials for finance, risk, data science, fintech, accounting and more

  • Home
  • Finance Exams
    • CFA Exam
    • CAIA Exam
    • ERP Exam
    • FRM Exam
    • PRM Exam
  • Tutorials
  • Careers
  • Calculators
  • Products

Building Credit Risk Model

Data Science, Risk Management

This lesson is part 21 of 28 in the course Credit Risk Modelling in R

The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid.

1
2
3
4
> 100*nrow(data_test %>% filter(loan_status=="Fully.Paid"))/nrow(data_test)
[1] 70.16704
>


For our test data, we gain 70.3% accuracy by just following the above strategy. Recall that we are yet to include the outcome of ‘Current’ loans. In a real situation, the ratio of Fully Paid loans is usually much higher so accuracy metric is not our main concern here. We will instead focus on a trade-off in identifying a default loan as an expense of mislabelling some good loans. We will look at ROC curve and pay particular focus on AUC when we train our models.

There is a disproportion in our target variable (Loan Status, too many Fully paid and very few Default loans). To solve this unbalanced data problem, we can downsample the majority class such that we have a sample with 50/50 data for the target variable. In this case, we will downsample so that the Fully Paid loans are equal to Default loans. This method tends to work well and run faster than upsampling or cost-sensitive training. Downsampling helps because, as we saw above, it’s trivial to achieve 70% accuracy in this case). Downsampling also helps in reducing data size.

Note that at the end, we aim to stack the results of various learning models (Logistic Regression, SVM, RandomForest, and Extreme Gradient Boosting (XGB)). Since the downside of downsampling is that information of majority class is discarded, we will continue to make a new downsampling data when we feed it to each model along the way. We anticipate that better result can be obtained by stacking all 4 models since it gets more information from the majority class.

Series Navigation‹ Create a Function and Prepare Test Data in RLogistic Regression Model in R ›
Join Our Facebook Group - Finance, Risk and Data Science

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

In this Course

  • Credit Risk Modelling – Case Studies
  • Classification vs. Regression Models
  • Case Study – German Credit – Steps to Build a Predictive Model
  • Import Credit Data Set in R
  • German Credit Data : Data Preprocessing and Feature Selection in R
  • Credit Modelling: Training and Test Data Sets
  • Build the Predictive Model
  • Logistic Regression Model in R
  • Measure Model Performance in R Using ROCR Package
  • Create a Confusion Matrix in R
  • Credit Risk Modelling – Case Study- Lending Club Data
  • Explore Loan Data in R – Loan Grade and Interest Rate
  • Credit Risk Modelling – Required R Packages
  • Loan Data – Training and Test Data Sets
  • Data Cleaning in R – Part 1
  • Data Cleaning in R – Part 2
  • Data Cleaning in R – Part 3
  • Data Cleaning in R – Part 5
  • Remove Dimensions By Fitting Logistic Regression
  • Create a Function and Prepare Test Data in R
  • Building Credit Risk Model
  • Logistic Regression Model in R
  • Support Vector Machine (SVM) Model in R
  • Random Forest Model in R
  • Extreme Gradient Boosting in R
  • Predictive Modelling: Averaging Results from Multiple Models
  • Predictive Modelling: Comparing Model Results
  • How Insurance Companies Calculate Risk

Finance Exam Products

  • CFA Level I Mock Exam
  • CFA Level I Practice Questions
  • CFA Level I Authority
  • PRM Exam I Practice Questions
View All Products

Latest Tutorials

    • Machine Learning in Finance Using Python
    • Portfolio Analysis in R
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Prime Brokerage
    • Financial Reporting Standards
    • Fraud
Facebook Group

Footer

Recent Posts

  • Social media reporting – it is not as difficult as you think
  • Model Selection in Machine Learning
  • Evaluate Model Performance – Loss Function
  • Train-Test Datasets in Machine Learning
  • Feature Selection in Machine Learning

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

Copyright © 2019 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us