- Machine Learning with Python
- What is Machine Learning?
- Data Preprocessing in Data Science and Machine Learning
- Feature Selection in Machine Learning
- Train-Test Datasets in Machine Learning
- Evaluate Model Performance - Loss Function
- Model Selection in Machine Learning
- Bias Variance Trade Off
- Supervised Learning Models
- Multiple Linear Regression
- Logistic Regression
- Logistic Regression in Python using scikit-learn Package
- Decision Trees in Machine Learning
- Random Forest Algorithm in Python
- Support Vector Machine Algorithm Explained
- Multivariate Linear Regression in Python with scikit-learn Library
- Classifier Model in Machine Learning Using Python
- Cross Validation to Avoid Overfitting in Machine Learning
- K-Fold Cross Validation Example Using Python scikit-learn
- Unsupervised Learning Models
- K-Means Algorithm Python Example
- Neural Networks Overview
Feature Selection in Machine Learning
Feature Selection is one of the core concepts in machine learning and has a high impact on the performance of the model. Irrelevant or partially irrelevant features can negatively impact the model performance.
In this process, those features which contribute most to the prediction variable are selected. In order to get an idea about which features could have more predictive power in a machine learning model, we will load Open, High, Low, Close, Volume (OHLCV) data for AMZ stock, and create some new features using Python.
Afterwards, we will use data visualizations and other common approaches for a smart selection of the features.
We will be performing this process using Python. The example below has 4 main steps:
- Import the Python libraries that will be used.
- Calculate Technical Indicators with the
get_technical_indicators()
function. - Plot scatterplots among features and target variable
_AdjClose._
- Make a Heat Map to show the correlation between each of the features and the target variable.
- Fit a Random Forest Model to extract feature importance of the independent variables (we will explain this algorithm in future sections but now will use some of the tools that Random Forest provides).
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
The data for AMZ stock is loaded into the environment using the read_csv()
method.
# To read the file, make sure to provide path to the correct directory in your computer.
amz = pd.read_csv("C:/Users/Nicolas/Documents/Machine_Learning_Course/AMZ.csv")
This content is for paid members only.
Join our membership for lifelong unlimited access to all our data science learning content and resources.