- Credit Risk Modelling - Case Studies
- Classification vs. Regression Models
- Case Study - German Credit - Steps to Build a Predictive Model
- Import Credit Data Set in R
- German Credit Data : Data Preprocessing and Feature Selection in R
- Credit Modelling: Training and Test Data Sets
- Build the Predictive Model
- Logistic Regression Model in R
- Measure Model Performance in R Using ROCR Package
- Create a Confusion Matrix in R
- Credit Risk Modelling - Case Study- Lending Club Data
- Explore Loan Data in R - Loan Grade and Interest Rate
- Credit Risk Modelling - Required R Packages
- Loan Data - Training and Test Data Sets
- Data Cleaning in R - Part 1
- Data Cleaning in R - Part 2
- Data Cleaning in R - Part 3
- Data Cleaning in R - Part 5
- Remove Dimensions By Fitting Logistic Regression
- Create a Function and Prepare Test Data in R
- Building Credit Risk Model
- Credit Risk - Logistic Regression Model in R
- Support Vector Machine (SVM) Model in R
- Random Forest Model in R
- Extreme Gradient Boosting in R
- Predictive Modelling: Averaging Results from Multiple Models
- Predictive Modelling: Comparing Model Results
- How Insurance Companies Calculate Risk
Random Forest Model in R
Now, we will tune RandomForest model. Like SVM, we tune parameter based on 5% downsampling data. The procedure is exactly the same as for SVM model. Below we have reproduced the code for Random Forest model.
set.seed(300)
#down sampling again so than we get more info when stacking
samp = downSample(data_train[-getIndexsOfColumns(data_train, c( "loan_status") )],data_train$loan_status,yname="loan_status")
#choose small data for tuning
train_index_tuning = createDataPartition(samp$loan_status,p = 0.05,list=FALSE,times=1)
#choose small data for re-train
train_index_training = createDataPartition(samp$loan_status,p = 0.1,list=FALSE,times=1)
rfGrid = expand.grid(
.mtry = as.integer(seq(2,ncol(samp), (ncol(samp) - 2)/4))
)
#Install random forest package
library(randomForest)
rfTuned = train(
samp[train_index_tuning,-getIndexsOfColumns(samp,"loan_status")],
y = samp[train_index_tuning,"loan_status"],
method = "rf",
tuneGrid = rfGrid,
metric = "ROC",
trControl = ctrl,
preProcess = NULL,
ntree = 100
)
plot(rfTuned)
> rfTuned
Random Forest
1268 samples
70 predictor
2 classes: 'Default', 'Fully.Paid'
No pre-processing
Resampling: Cross-Validated (3 fold)
Summary of sample sizes: 845, 845, 846
Resampling results across tuning parameters:
mtry ROC Sens Spec
2 0.7028532 0.6909073 0.6199440
19 0.6832394 0.6451832 0.6088706
36 0.6706683 0.6231333 0.5820516
53 0.6748038 0.6263003 0.6026111
71 0.6751421 0.6609511 0.5962622
ROC was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.
>
The best parameter is mtry(number of predictors) = 2. Like SVM, we fit 10% of downsampling data with this value.
rf_model = randomForest(loan_status ~ . ,data = samp[train_index_training,],mtry = 2,ntree=400)
predict_loan_status_rf = predict(rf_model,data_test,"prob")
predict_loan_status_rf = as.data.frame(predict_loan_status_rf)$Fully.Paid
rocCurve_rf = roc(response = data_test$loan_status,
predictor = predict_loan_status_rf)
auc_curve = auc(rocCurve_rf)
plot(rocCurve_rf,legacy.axes = TRUE,print.auc = TRUE,col="red",main="ROC(RandomForest)")
> rocCurve_rf
Call:
roc.default(response = data_test$loan_status, predictor = predict_loan_status_rf)
Data: predict_loan_status_rf in 5358 controls (data_test$loan_status Default) < 12602 cases (data_test$loan_status Fully.Paid).
Area under the curve: 0.705
>
predict_loan_status_label = ifelse(predict_loan_status_rf<0.5,"Default","Fully.Paid")
c = confusionMatrix(predict_loan_status_label,data_test$loan_status,positive="Fully.Paid")
table_perf[3,] = c("RandomForest",
round(auc_curve,3),
as.numeric(round(c$overall["Accuracy"],3)),
as.numeric(round(c$byClass["Sensitivity"],3)),
as.numeric(round(c$byClass["Specificity"],3)),
as.numeric(round(c$overall["Kappa"],3))
)
The model’s performance is as follow:
> tail(table_perf,1)
model auc accuracy sensitivity specificity kappa
3 RandomForest 0.705 0.657 0.666 0.635 0.268
>
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.