As we pointed out earlier, both classification and regression models are in the field of Supervised Learning. These models are characterized by having a group of features or independent variables and a target variable that is the variable that the model aims to predict.
This target variable is called the labelled data and is the main property of the Supervised Learning models because it acts as the orientation for constructing the model in the training phase and to evaluate model performance.
In a classification problem the target variable to predict (y) is a categorical variable and can take a finite set of possible choices, K. On the other hand, in the regression problem, the target variable y is a real value rather than categorical.
The classification problem has the goal of estimating membership for a set of features into a particular group. A common classification problem in the financial sector is to determine the price direction for the next day based on N days of asset price history.
The regression problem involves estimate a real value response with a set of features or predictors as the independent variables. In the financial field, an example is to estimate tomorrows asset price based on the historical prices (or other features) of the price. The regression problem would estimate the real value of the price and not just its direction.
Common classification algorithms include Logistic Regression, Naïve Bayes Classifiers, Support Vector Machines, Decision Tree, and Deep Convolution Neural Networks. Common regression techniques include Linear Regression, Support Vector Regression and Random Forest.
In the next lessons, we will explain relevant concepts of the most popular algorithms used for Supervised Learning. These algorithms are the following:
- Multiple Linear Regression
- Logistic Regression
- Decision Tree-Random Forest
- Support Vector Machine
- Linear Discriminant Analysis