The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid. For our test data, we gain 70.3% accuracy by just following the above strategy. Recall that we are yet to include the outcome of ‘Current’ loans. In a real situation, the ratio […]

# Credit Risk Modelling in R

## Credit Risk – Logistic Regression Model in R

To build our first model, we will tune Logistic Regression to our training dataset. First we set the seed (to any number. we have chosen 100) so that we can reproduce our results. Then we create a downsampled dataset called samp which contains an equal number of Default and Fully Paid loans. We can use the table() function to check that the downsampling […]

## Support Vector Machine (SVM) Model in R

A support vector machine (SVM) is a supervised learning technique that analyzes data and isolates patterns applicable to both classification and regression. The classifier is useful for choosing between two or more possible outcomes that depend on continuous or categorical predictor variables. Based on training and sample classification data, the SVM algorithm assigns the target […]

## Random Forest Model in R

Now, we will tune RandomForest model. Like SVM, we tune parameter based on 5% downsampling data. The procedure is exactly the same as for SVM model. Below we have reproduced the code for Random Forest model. The best parameter is mtry(number of predictors) = 2. Like SVM, we fit 10% of downsampling data with this […]

## Extreme Gradient Boosting in R

Extreme Gradient Boosting has a very efficient implementation. Unlike SVM and RandomForest, we can tune parameter using the whole downsampling set. We focus on varying Ridge & Lasso regularization and learning rate. We use 10% of data for validating tuning parameter. The best tuning parameter is eta = 0.1, alpha = 0.5, and lambda = 1.0. We retrain […]

## Predictive Modelling: Averaging Results from Multiple Models

Our final model is to combine the result of previous machine learning models and provide a single prediction by averaging probabilities from all previous models. We get the following performance:

## Predictive Modelling: Comparing Model Results

AUC for each model and their performance when we set probability cutoff at 50% is summarised below: Kappa statistics from all models exceed 20% by just small amount, which indicated that they perform moderately better than chance. XGB takes advantage of receiving all downsampling data and provides highest AUC. Comparing performance across models may not […]

## How Insurance Companies Calculate Risk

People who are good at calculating probability and risk are few and far between. That is why understanding statistics is so vital to the market. Like finance, the insurance industry is a collective of individuals who understand how risks change over time. The majority of Americans, on the other hand, find the sector overwhelming and […]