Classification vs. Regression Models
While building any predictive model, it is important to first understand whether it is a classification or a regression problem. Let’s understand the difference between the two:
1. Classification
In a classification problem, we are trying to predict the class of a data point (discreet number of values). The Y variable that we are trying to predict generally comes in categorical form and has a finite number of classes. For example, we can classify a loan as Default or No Default. Or we can classify an image as a cat or a dog. The credit risk problem that we are trying to solve is a classification problem. We call it a binary classification when there are only one of the two classes to predict (Default or No Default - 0 or 1). If we have more than 2 classes, we call it a multi-classification problem. Such models are commonly referred to as "classifiers".
2. Regression
The problem we are solving is considered a regression problem if we are predicting a continuous valued output, for example, predicting the price of a house, or stock prices.
When we are solving a data science problem, we will first define our problem as a classification or a regression problem, depending on the output that we are trying to predict.
In our case, we can conclude that predicting default is a classification problem. Let’s now start with our first case study and understand the steps involved in model building.
Course Downloads
- Credit Risk Modelling - Case Studies
- Classification vs. Regression Models
- Case Study - German Credit - Steps to Build a Predictive Model
- Import Credit Data Set in R
- German Credit Data : Data Preprocessing and Feature Selection in R
- Credit Modelling: Training and Test Data Sets
- Build the Predictive Model
- Logistic Regression Model in R
- Measure Model Performance in R Using ROCR Package
- Create a Confusion Matrix in R
- Credit Risk Modelling - Case Study- Lending Club Data
- Explore Loan Data in R - Loan Grade and Interest Rate
- Credit Risk Modelling - Required R Packages
- Loan Data - Training and Test Data Sets
- Data Cleaning in R - Part 1
- Data Cleaning in R - Part 2
- Data Cleaning in R - Part 3
- Data Cleaning in R - Part 5
- Remove Dimensions By Fitting Logistic Regression
- Create a Function and Prepare Test Data in R
- Building Credit Risk Model
- Credit Risk - Logistic Regression Model in R
- Support Vector Machine (SVM) Model in R
- Random Forest Model in R
- Extreme Gradient Boosting in R
- Predictive Modelling: Averaging Results from Multiple Models
- Predictive Modelling: Comparing Model Results
- How Insurance Companies Calculate Risk