• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High quality tutorials for finance, risk, data science, fintech, accounting and more

  • Home
  • Finance Exams
    • CFA Exam
    • CAIA Exam
    • ERP Exam
    • FRM Exam
    • PRM Exam
  • Tutorials
  • Careers
  • Calculators
  • Products

Train-Test Datasets in Machine Learning

Data Science

This lesson is part 5 of 7 in the course Machine Learning in Finance Using Python

Once we have cleaned the data and have selected the features from the data for building the model, the next step  is to generate the train and test dataset. We will divide our data into two different data sets, namely training and testing datasets. The model will be built using the training set and then we will test it on the testing set to evaluate how our model is performing. There are many ways in which we can split the data, for example, we can split the data randomly.

Splitting the data into training and testing set is required for Supervised Learning problems. Unsupervised Learning models don’t require a train and a test dataset. 

Classification and Regression problems would supervise or “train” a model with specific data in order to provide predictions of the target variable y. The process of training a dataset is conducted by choosing the set of relevant features or independent variables and combining these with a response y (labelled data) that is the observed value of the target variable.

In this phase, the algorithm is trained on the data and will determine the influence of each feature on the response y. Finally, we can make predictions for out-of-sample or unseen data based on the prior training experience. 

This process has two main stages that are called training and testing the model. In the training phase as we described above, we fit the model with the data and afterwards we use the test data to assess the model performance. 

The training dataset includes both features and the target variable while the testing dataset includes only features that are used to run the model and get the predictions of the target variable. The training dataset usually represent 70%-80% of the total data and the test dataset is the remaining portion of the data which is preserved to test the model accuracy.

Series Navigation‹ Feature Selection in Machine LearningEvaluate Model Performance – Loss Function ›
Join Our Facebook Group - Finance, Risk and Data Science

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

In this Course

  • Machine Learning with Python
  • What is Machine Learning?
  • Data Preprocessing in Data Science and Machine Learning
  • Feature Selection in Machine Learning
  • Train-Test Datasets in Machine Learning
  • Evaluate Model Performance – Loss Function
  • Model Selection in Machine Learning

Finance Exam Products

  • CFA Level I Mock Exam
  • CFA Level I Practice Questions
  • CFA Level I Authority
  • PRM Exam I Practice Questions
View All Products

Latest Tutorials

    • Machine Learning in Finance Using Python
    • Portfolio Analysis in R
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Prime Brokerage
    • Financial Reporting Standards
    • Fraud
Facebook Group

Footer

Recent Posts

  • Social media reporting – it is not as difficult as you think
  • Model Selection in Machine Learning
  • Evaluate Model Performance – Loss Function
  • Train-Test Datasets in Machine Learning
  • Feature Selection in Machine Learning

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

Copyright © 2019 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us