Unsupervised learning models are composed of features that are not associated with a response. This means that this type of machine learning algorithms do not have labelled data as their interest lies in the attributes of the features themselves. In unsupervised learning models there is no concept of training or supervising a dataset as the […]

## K-Fold Cross Validation Example Using Python scikit-learn

In this post, we will provide an example of Cross Validation using the K-Fold method with the python scikit learn library. The K-Fold Cross Validation example would have k parameters equal to 5. By using a ‘for’ loop, we will fit each model using 4 folds for training data and 1 fold for testing data, […]

## Cross Validation to Avoid Overfitting in Machine Learning

Cross validation is a technique used to determine how the results of a machine learning model could be generalized to new, unseen data. The training error associated with a model might underestimate the test error of the model, so the Cross Validation approach provides a mechanism to get the MSE test with the current dataset […]

## Classifier Model in Machine Learning Using Python

In the post, we will learn about how to create a classifier model in machine learning using python. We will create a supervised classifier model that will train a dataset with a set of features and then use test data to predict price direction at day k with information only known at day k-1. Price […]

## Multivariate Linear Regression in Python with scikit-learn Library

In this post, we will provide an example of machine learning regression algorithm using the multivariate linear regression in Python from scikit-learn library in Python. The example contains the following steps: Step 1: Import libraries and load the data into the environment. Step 2: Generate the features of the model that are related with some […]

## Support Vector Machine Algorithm Explained

The Support Vector Machine is a Supervised Machine Learning algorithm that can be used for both classification and regression problems. However, it is most used in classification problems. The goal of the algorithm is to classify new unseen objects into two separate groups based on their properties and a set of examples that are already […]

## Random Forest Algorithm in Python

The Random Forest algorithm can be described in the following conceptual steps: Select k features randomly from the dataset and build a decision tree from those features where k < m (total number of features) Repeat this n times in order to have n decision trees from different random combinations of k features. Take each […]

## Decision Trees in Machine Learning

Decision tree is a popular Supervised learning algorithm which can handle classification and regression problems. For both problems, the algorithm breaks down a dataset into smaller subsets by using if-then-else decision rules within the features of the data. The general idea of a decision tree is that each of the features are evaluated by the […]

## Logistic Regression in Python using scikit-learn Package

Using the scikit-learn package from python, we can fit and evaluate a logistic regression algorithm with a few lines of code. Also, for binary classification problems the library provides interesting metrics to evaluate model performance such as the confusion matrix, Receiving Operating Curve (ROC) and the Area Under the Curve (AUC). Hyperparameter Tuning in Logistic […]

## Logistic Regression

In machine learning, the Logistic Regression algorithm is used for classification problems. It provides an output that we can interpret as a probability that a new observation belongs to a certain class. Generally, logistic regression is used to classify binary classes but works on multiple and ordinal classes too. Logistic regression estimates a continuous quantity […]