In this tutorial, we will learn credit risk modeling in R using case studies. Specifically, we will use two case studies starting with a simpler one using which we will learn the methodology and important concepts and techniques. Note: As a pre-requisite, it will be helpful to go through this tutorial first – Foundations of […]

# Credit Risk Modelling in R

## Classification vs. Regression Models

While building any predictive model, it is important to first understand whether it is a classification or a regression problem. Let’s understand the difference between the two: 1. Classification In a classification problem, we are trying to predict the class of a data point (discreet number of values). The Y variable that we are trying […]

## Case Study – German Credit – Steps to Build a Predictive Model

We will preform various steps in building our predictive model. These steps are explained below: Step 1 – Data Selection The first step is to get the dataset that we will use for building the model. For this case study, we are using the German Credit Scoring Data Set in the numeric format which contains […]

## Import Credit Data Set in R

We are using the German Credit Scoring Data Set in numeric format which contains information about 21 attributes of 1000 loans. First, setup a working directory and place this data file in that directory. Then, import the data into your R session using the following command: Attribute Details 20 attributes are used in judging a […]

## German Credit Data : Data Preprocessing and Feature Selection in R

The purpose of preprocessing is to make your raw data suitable for the data science algorithms. For example, we may want to remove the outliers, remove or change imputations (missing values, and so on). The dataset that we have selected does not have any missing data. But, in real time there is possibility that the […]

## Credit Modelling: Training and Test Data Sets

For building the model, we will divide our data into two different data sets, namely training and testing datasets. The model will be built using the training set and then we will test it on the testing set to evaluate how our model is performing. There are many ways in which we can split the […]

## Build the Predictive Model

We have now gathered our data and cleansed/transformed it to suit our modeling needs. The next step is to actually build the model. The goal of predictive modeling is to build a model to predict the future outcomes using statistical techniques. We use well-known statistical methods (algorithms) to find the function (model) that best describes […]

## Logistic Regression Model in R

Logistic regression aims to model the probability of an event occurring depending on the values of independent variables. The logistic regression model seeks to estimate that an event (default) will occur for a randomly selected observation versus the probability that the event does not occur. Suppose we have data for 1000 loans along with all […]

## Measure Model Performance in R Using ROCR Package

R’s ROCR package can be used for evaluating and visualizing the performance of classifiers / fitted models. It is helpful for estimating performance measures and plotting these measures over a range of cutoffs. (Note: the terms classifier and fitting model are used interchangeably) The package features over 25 performance. The three important functions ‘prediction’, ‘performance’ […]

## Create a Confusion Matrix in R

A confusion matrix is a tabular representation of Actual vs Predicted values. As you can see, the confusion matrix avoids “confusion” by measuring the actual and predicted values in a tabular format. In table above, Positive class = 1 and Negative class = 0. Following are the metrics we can derive from a confusion matrix: […]