- Credit Risk Modelling - Case Studies
- Classification vs. Regression Models
- Case Study - German Credit - Steps to Build a Predictive Model
- Import Credit Data Set in R
- German Credit Data : Data Preprocessing and Feature Selection in R
- Credit Modelling: Training and Test Data Sets
- Build the Predictive Model
- Logistic Regression Model in R
- Measure Model Performance in R Using ROCR Package
- Create a Confusion Matrix in R
- Credit Risk Modelling - Case Study- Lending Club Data
- Explore Loan Data in R - Loan Grade and Interest Rate
- Credit Risk Modelling - Required R Packages
- Loan Data - Training and Test Data Sets
- Data Cleaning in R - Part 1
- Data Cleaning in R - Part 2
- Data Cleaning in R - Part 3
- Data Cleaning in R - Part 5
- Remove Dimensions By Fitting Logistic Regression
- Create a Function and Prepare Test Data in R
- Building Credit Risk Model
- Credit Risk - Logistic Regression Model in R
- Support Vector Machine (SVM) Model in R
- Random Forest Model in R
- Extreme Gradient Boosting in R
- Predictive Modelling: Averaging Results from Multiple Models
- Predictive Modelling: Comparing Model Results
- How Insurance Companies Calculate Risk
Classification vs. Regression Models
While building any predictive model, it is important to first understand whether it is a classification or a regression problem. Let’s understand the difference between the two:
In a classification problem, we are trying to predict the class of a data point (discreet number of values). The Y variable that we are trying to predict generally comes in categorical form and has a finite number of classes. For example, we can classify a loan as Default or No Default. Or we can classify an image as a cat or a dog. The credit risk problem that we are trying to solve is a classification problem. We call it a binary classification when there are only one of the two classes to predict (Default or No Default - 0 or 1). If we have more than 2 classes, we call it a multi-classification problem. Such models are commonly referred to as "classifiers".
The problem we are solving is considered a regression problem if we are predicting a continuous valued output, for example, predicting the price of a house, or stock prices.
When we are solving a data science problem, we will first define our problem as a classification or a regression problem, depending on the output that we are trying to predict.
In our case, we can conclude that predicting default is a classification problem. Let’s now start with our first case study and understand the steps involved in model building.
Unlock full access to Finance Train and see the entire library of member-only content and resources.