Build the Predictive Model

We have now gathered our data and cleansed/transformed it to suit our modeling needs. The next step is to actually build the model. The goal of predictive modeling is to build a model to predict the future outcomes using statistical techniques.

We use well-known statistical methods (algorithms) to find the function (model) that best describes a dependency between different variables (a.k.a features). The crux of this is to fit a model to the data such that the function we get is able to predict the outcome based on the given features. In our example, Account Balance, Loan Purpose, Telephone, etc are all predictors/features. The creditability is the outcome/response (the value that we are trying to predict). This is also called the target class, response variable or dependent variable.

We create the model using one of the many algorithms that best describes the relationship between the predictors and the response variable. This is also called training the model. Once the model is ready, it can be used to make the prediction for creditability given all the other features of the loan applicant/borrower.

As we have established earlier, the problem we are looking at is a binary classification problem - Creditability as Bad Credit (0) or Good Credit (1).

Below is a list of the popular algorithms used for classification problems.

  1. Linear Classifiers: Logistic Regression, Naive Bayes Classifier
  2. Support Vector Machines
  3. Decision Trees
  4. Boosted Trees
  5. Random Forest
  6. Neural Networks
  7. Nearest Neighbor

Most often a data scientist will create many models using different algorithms and then use the best or average of all the models. In this case study, we will build the model using just one algorithm, i.e., Logistic Regression.

Related Downloads

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $39 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.