We will use QuantMod R package to download stock data. This allows for downloading stock data from multiple sources, although Yahoo is the default option. To start using the Quantmod library, you can install and load it in your R environment using the following commands in R console or R Studio (Preferred). Once the package […]

## R Financial Packages for Portfolio Analysis

This tutorial will teach you about how to use R for portfolio analysis. We will be using various financial packages from R that will help us perform portfolio analysis. Let’s look at these packages: Quantmod Quantmod is a very powerful package that is designed for quant traders to explore and build quantitative trading models. For […]

## Predictive Modelling: Comparing Model Results

AUC for each model and their performance when we set probability cutoff at 50% is summarised below: Kappa statistics from all models exceed 20% by just small amount, which indicated that they perform moderately better than chance. XGB takes advantage of receiving all downsampling data and provides highest AUC. Comparing performance across models may not […]

## Predictive Modelling: Averaging Results from Multiple Models

Our final model is to combine the result of previous machine learning models and provide a single prediction by averaging probabilities from all previous models. We get the following performance:

## Extreme Gradient Boosting in R

Extreme Gradient Boosting has a very efficient implementation. Unlike SVM and RandomForest, we can tune parameter using the whole downsampling set. We focus on varying Ridge & Lasso regularization and learning rate. We use 10% of data for validating tuning parameter. The best tuning parameter is eta = 0.1, alpha = 0.5, and lambda = 1.0. We retrain […]

## Random Forest Model in R

Now, we will tune RandomForest model. Like SVM, we tune parameter based on 5% downsampling data. The procedure is exactly the same as for SVM model. Below we have reproduced the code for Random Forest model. The best parameter is mtry(number of predictors) = 2. Like SVM, we fit 10% of downsampling data with this […]

## Support Vector Machine (SVM) Model in R

A support vector machine (SVM) is a supervised learning technique that analyzes data and isolates patterns applicable to both classification and regression. The classifier is useful for choosing between two or more possible outcomes that depend on continuous or categorical predictor variables. Based on training and sample classification data, the SVM algorithm assigns the target […]

## Logistic Regression Model in R

To build our first model, we will tune Logistic Regression to our training dataset. First we set the seed (to any number. we have chosen 100) so that we can reproduce our results. Then we create a downsampled dataset called samp which contains an equal number of Default and Fully Paid loans. We can use the table() function to check that the downsampling […]

## Building Credit Risk Model

The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid. For our test data, we gain 70.3% accuracy by just following the above strategy. Recall that we are yet to include the outcome of ‘Current’ loans. In a real situation, the ratio […]

## Create a Function and Prepare Test Data in R

When we build the model, we will need the same set of columns in the test data also and will also need to apply all the same transformations that we have done to the test_data also. Kept Columns Create Function Prepare Test Data We will now take our test data and apply our data transformations to it. […]