• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High quality tutorials for finance, risk, data science, fintech, accounting and more

  • Home
  • Finance Exams
    • CFA Exam
    • CAIA Exam
    • ERP Exam
    • FRM Exam
    • PRM Exam
  • Tutorials
  • Careers
  • Calculators
  • Products

Credit Modelling: Training and Test Data Sets

Data Science, Risk Management

This lesson is part 6 of 28 in the course Credit Risk Modelling in R

For building the model, we will divide our data into two different data sets, namely training and testing datasets. The model will be built using the training set and then we will test it on the testing set to evaluate how our model is performing.

There are many ways in which we can split the data.

We can use the “sample” command to randomly select certain index numbers and then use the selected index numbers to divide the dataset into training and testing dataset. Below is the code for doing this. In the code below we use 30% of the data for testing and rest of the 70% for training.

1
2
3
4
5
6
7
8
9
10
# Sample Indexes
> indexes = sample(1:nrow(creditdata), size=0.3*nrow(creditdata))
# Split data
> credit_test = creditdata_new[indexes,]
> credit_train = creditdata_new[-indexes,]
> dim(credit_test)
[1] 300  18
> dim(credit_train)
[1] 700  18
>

Other Ways to Split Data

  1. We can use the rpart function of the rpart package to split the data. RPART stands for Recursive Partitioning And Regression Trees. The rpart algorithm works by splitting the dataset recursively, which means that the subsets that arise from a split are further split until a predetermined termination criterion is reached. It allows you to construct splitting rules in many different ways.
  2. We can also use the createDataPartition function of the caret package to split the data set
Series Navigation‹ German Credit Data : Data Preprocessing and Feature Selection in RBuild the Predictive Model ›
Join Our Facebook Group - Finance, Risk and Data Science

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

In this Course

  • Credit Risk Modelling – Case Studies
  • Classification vs. Regression Models
  • Case Study – German Credit – Steps to Build a Predictive Model
  • Import Credit Data Set in R
  • German Credit Data : Data Preprocessing and Feature Selection in R
  • Credit Modelling: Training and Test Data Sets
  • Build the Predictive Model
  • Logistic Regression Model in R
  • Measure Model Performance in R Using ROCR Package
  • Create a Confusion Matrix in R
  • Credit Risk Modelling – Case Study- Lending Club Data
  • Explore Loan Data in R – Loan Grade and Interest Rate
  • Credit Risk Modelling – Required R Packages
  • Loan Data – Training and Test Data Sets
  • Data Cleaning in R – Part 1
  • Data Cleaning in R – Part 2
  • Data Cleaning in R – Part 3
  • Data Cleaning in R – Part 5
  • Remove Dimensions By Fitting Logistic Regression
  • Create a Function and Prepare Test Data in R
  • Building Credit Risk Model
  • Logistic Regression Model in R
  • Support Vector Machine (SVM) Model in R
  • Random Forest Model in R
  • Extreme Gradient Boosting in R
  • Predictive Modelling: Averaging Results from Multiple Models
  • Predictive Modelling: Comparing Model Results
  • How Insurance Companies Calculate Risk

Finance Exam Products

  • CFA Level I Mock Exam
  • CFA Level I Practice Questions
  • CFA Level I Authority
  • PRM Exam I Practice Questions
View All Products

Latest Tutorials

    • Machine Learning in Finance Using Python
    • Portfolio Analysis in R
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Prime Brokerage
    • Financial Reporting Standards
    • Fraud
Facebook Group

Footer

Recent Posts

  • Social media reporting – it is not as difficult as you think
  • Model Selection in Machine Learning
  • Evaluate Model Performance – Loss Function
  • Train-Test Datasets in Machine Learning
  • Feature Selection in Machine Learning

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

Copyright © 2019 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us