- Credit Risk Modelling - Case Studies
- Classification vs. Regression Models
- Case Study - German Credit - Steps to Build a Predictive Model
- Import Credit Data Set in R
- German Credit Data : Data Preprocessing and Feature Selection in R
- Credit Modelling: Training and Test Data Sets
- Build the Predictive Model
- Logistic Regression Model in R
- Measure Model Performance in R Using ROCR Package
- Create a Confusion Matrix in R
- Credit Risk Modelling - Case Study- Lending Club Data
- Explore Loan Data in R - Loan Grade and Interest Rate
- Credit Risk Modelling - Required R Packages
- Loan Data - Training and Test Data Sets
- Data Cleaning in R - Part 1
- Data Cleaning in R - Part 2
- Data Cleaning in R - Part 3
- Data Cleaning in R - Part 5
- Remove Dimensions By Fitting Logistic Regression
- Create a Function and Prepare Test Data in R
- Building Credit Risk Model
- Credit Risk - Logistic Regression Model in R
- Support Vector Machine (SVM) Model in R
- Random Forest Model in R
- Extreme Gradient Boosting in R
- Predictive Modelling: Averaging Results from Multiple Models
- Predictive Modelling: Comparing Model Results
- How Insurance Companies Calculate Risk
Case Study - German Credit - Steps to Build a Predictive Model
We will preform various steps in building our predictive model. These steps are explained below:
Step 1 – Data Selection
The first step is to get the dataset that we will use for building the model. For this case study, we are using the German Credit Scoring Data Set in the numeric format which contains information about 21 attributes of 1000 loans.
Step 2 – Data Pre-Processing
The purpose of preprocessing is to make your raw data suitable for the data science algorithms. For example, we may want to remove the outliers, remove or change imputations (missing values, and so on).
Step 3 – Features Selection
The raw data we have may contain many features/independent variables, and there will be many features which will be quite useless from the viewpoint of predicting the response variable. Such features should be removed from the dataset. We also need to check if there are any redundant information represented using two attributes. We can then safely remove one of the two attributes. This can be done by finding the correlation between various attributes. The resultant dataset with the reduced number of features is ready for use by the classification algorithms.
Step 4 – Building Classification Model
In this step, we build our classification model. We split the data into training and test set. Then we train our model on the training dataset. Once we have the fitted model, we can apply the model to the test dataset to predict the values of our response variable.
This content is for paid members only.
Join our membership for lifelong unlimited access to all our data science learning content and resources.