• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High Quality tutorials for finance, risk, data science

  • Home
  • Data Science
  • CFA® Exam
  • PRM Exam
  • Tutorials
  • Careers
  • Products
  • Login

Decision Trees in Machine Learning

Data Science

This lesson is part 13 of 22 in the course Machine Learning in Finance Using Python

Decision tree is a popular Supervised learning algorithm which can handle classification and regression problems. For both problems, the algorithm breaks down a dataset into smaller subsets by using if-then-else decision rules within the features of the data. 

The general idea of a decision tree is that each of the features are evaluated by the algorithm and used to split the tree based on the capacity that they have to explain the target variable. The features could be categorical or continuous variables.  

In the process of building the tree, the most important features are selected by the algorithm in a top-down approach creating the decision nodes and branches of the tree and making predictions at points when the tree cannot be expanded any more. 

The tree is composed by a root node, decision nodes and leaf nodes. The root node is the top most decision on the tree, and is the first time where the tree is split based on the best predictor of the dataset. 

The decision nodes are the intermediate steps in the construction of the tree and are the nodes used to split the tree based on different values of the independent variables or features of the model. The leaf nodes represent the end points of the tree, and hold the prediction of the model.  

Decision Tree Structure

Decision Trees in Machine Learning

The methods to split the tree are different depending if we are in a classification or regression problem. We will provide an overview of the different measures that are used to split the tree in both types of problems.

Classification Problem

In this type of problem the algorithm decides which is the best feature to split the tree based on two measures that are Entropy and Information Gain. 

Entropy

Entropy is a coefficient that takes values between 0 and 1. It is a measure of randomness in data. If entropy is equal to zero, it means that all the data belongs to the same class, and if entropy is equal to 1, it means that half of the data belong to one class and the half belong to the other class (in binary classification). In this last situation, the data is perfect randomness.

The mathematical formula of entropy is as follows:

The goal of a decision tree problem is to reduce the measure of entropy while building the tree. As we stated above, entropy is a measure of the randomness of the data, and this measure needs to be reduced.  For this purpose, the entropy is combined with the Information Gain in order to select the feature that has higher value of Information Gain.

Information Gain

The Information gain is based on the decrease in Entropy after a dataset is split on an attribute or feature. The attribute that has the high value of Information Gain will be used to split the tree in nodes in a top-down approach.The mathematical formula of Information Gain is the following:

In the equation above, the Information Gain subtracts the entropy of the target variable Y given X (E(Y/X)) from the entropy of the target variable Y (EY), to calculate the reduction of uncertainty of Y given additional piece of information X.

The aim of the decision tree in this type of problem is to reduce the entropy of the target variable. For this, the decision tree algorithm would use the Entropy and the Information Gain of each feature to decide what attribute will provide more information (or reduce the uncertainty of the target variable). 

Decision Tree Regression Problem

In the regression problem, the tree is composed (like in the classification problem) with nodes, branches and leaf nodes, where each node represents a feature or attribute, each branch represents a decision rule and the leaf nodes represent an estimation of the target variable.

Contrary to the classification problem the estimation of the target variable in a regression problem is a continuous value and not a categorical value. The ID3 algorithm (Iterative Dichotomiser 3) can be used to construct a decision tree for regression which replace the Information Gain metric with the Standard Deviation Reduction (SDR). 

The Standard Deviation Reduction is based on the decrease in standard deviation after a dataset is split on an attribute. The tree will be split by the attribute with the higher Standard Deviation Reduction of the target variable.

The standard deviation is used as a measure of homogeneity of the data. The tree is built top-down from the root node and data is partitioned into subsets that contain homogeneous instances of the target variable. If the samples are homogeneous, its standard deviation is zero.

The mathematical formula of the Standard Deviation Reduction is the following:

Y is target variable

X is a specific attribute of the dataset

SDRY,X is the reduction of the standard deviation of the target variable Y given the information of X

SY is the standard deviation of the target variable

S(Y,X)  is the standard deviation of the target variable given the information of X

The calculation of S(Y,X) can be broken down with the following formula that is important to understand the process:

Sc is the standard deviation of the target variable based on unique values of the independent variable.

Pc  number of times (probability) of the value of the independent variables over the total values of the independent variable  

So the S(Y,X) is the sum of the probability of each of the values of the independent variable multiplied by the standard deviation of the target variable based on those values of the independent variable. 

The Standard deviation S(Y,X) is calculated for each feature or independent variable of the dataset. The feature with the great value of S(Y,X) is the feature that has more capability to explain the variations in the target variable, therefore is the feature that the decision tree algorithm will use to split the tree.

The same idea is that the feature that will be used to split the tree is the one that has the greatest Standard Deviation Reduction. The overall process of splitting a decision tree under the Standard Deviation Reduction can be synthetized in the following steps:

  • Calculate the standard deviation of the target variable
  • Calculate the Standard Deviation Reduction for all the independent variables
  • The feature with the higher value on the Standard Deviation Reduction is chosen for the decision node.
  • In each branch, the Coefficient Variation (CV) is calculated. This coefficient is a measure of homogeneity in the target variable. CV = Sx*100 (Standard deviation) /(mean of the target variable) * 100
  •  If the CV is less than certain threshold such as 10% that subset does not need further splitting and the node become a leaf node.

Overfitting in Decision Tree

One of the most common problems with decision tree is that they tend to overfit a lot. There are situations where the tree is built with specific features of the training data and the predictive power for unseen data is reduced. 

This is because the branches of the tree follow strict and specific rules of the train dataset but for samples that are not part of the training set the accuracy of the model is not good. There are two main ways to address over-fitting in Decision tree. One is by using the Random Forest algorithm and the second is by pruning the tree.

Random Forest

Random Forest avoid overfitting as it is an ensemble of n decision trees (not just one) with n results in the end. The final result of Random forest is the most frequent response variable among the n results of the n Decision Trees if the target variable is a categorical variable or the average of the target variable in the leaf node if the target variable is a continuous variable.

Pruning

Pruning is done after the initial training is complete, and involves the removal of nodes and branches starting from the leaf nodes such that the overall accuracy is not affected.  The pruning technique will reduce the size of the tree by removing nodes that provide little power to classify or predict new instances. 

Previous Lesson

‹ Logistic Regression in Python using scikit-learn Package

Next Lesson

Random Forest Algorithm in Python ›

Join Our Facebook Group - Finance, Risk and Data Science

Posts You May Like

How to Improve your Financial Health

CFA® Exam Overview and Guidelines (Updated for 2021)

Changing Themes (Look and Feel) in ggplot2 in R

Coordinates in ggplot2 in R

Facets for ggplot2 Charts in R (Faceting Layer)

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

In this Course

  • Machine Learning with Python
  • What is Machine Learning?
  • Data Preprocessing in Data Science and Machine Learning
  • Feature Selection in Machine Learning
  • Train-Test Datasets in Machine Learning
  • Evaluate Model Performance – Loss Function
  • Model Selection in Machine Learning
  • Bias Variance Trade Off
  • Supervised Learning Models
  • Multiple Linear Regression
  • Logistic Regression
  • Logistic Regression in Python using scikit-learn Package
  • Decision Trees in Machine Learning
  • Random Forest Algorithm in Python
  • Support Vector Machine Algorithm Explained
  • Multivariate Linear Regression in Python with scikit-learn Library
  • Classifier Model in Machine Learning Using Python
  • Cross Validation to Avoid Overfitting in Machine Learning
  • K-Fold Cross Validation Example Using Python scikit-learn
  • Unsupervised Learning Models
  • K-Means Algorithm Python Example
  • Neural Networks Overview

Latest Tutorials

    • Data Visualization with R
    • Derivatives with R
    • Machine Learning in Finance Using Python
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Financial Reporting Standards
    • Fraud
Facebook Group

Membership

Unlock full access to Finance Train and see the entire library of member-only content and resources.

Subscribe

Footer

Recent Posts

  • How to Improve your Financial Health
  • CFA® Exam Overview and Guidelines (Updated for 2021)
  • Changing Themes (Look and Feel) in ggplot2 in R
  • Coordinates in ggplot2 in R
  • Facets for ggplot2 Charts in R (Faceting Layer)

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

CFA Institute does not endorse, promote or warrant the accuracy or quality of Finance Train. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute.

Copyright © 2021 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us