# Multiple Linear Regression

The ** multiple linear regression** algorithm states that a response

**can be estimated with a set of input features**

*y***and an error term**

*x***The model can be expressed with the following mathematical equation:**

*ɛ.*βT*X* is the matrix notation of the equation, where βT, *X* ϵ ʀp+1 and ɛ ~ N(μ,σ2)

βT(transpose of β) and X are both real-valued vectors with dimension p+1 and ɛ is the residual term which represents the difference between the predictions of the model and the true observation of the variable ** y**.

The vector βT = (β0,β1,…βP) stores all the beta coefficients of the model. These coefficients measure how a change on some of the independent variable impact on the dependent or target variable.

The vector X = (1,x1,x2, …xp) hold all the values of the independent variables. Both vectors (T and X) are p+1 dimensional because of the need to include an intercept term.

The goal of the ** linear regression** model is to minimize the difference between the predictions and the real observations of the target variable. For this purpose, a method called

**is used which will derive the optimal set of coefficients for fitting the model.**

*Ordinal Least Squares (OLS)***Ordinal Least Squares**

Formally ** the OLS** model will minimize the

**between the observations of the target variable and the predictions of the model. The**

*Residual Sum of Squares (RSS)***is the**

*RSS***metric to assess model performance in the**

*loss function***model and has the following formulation:**

*linear regression*** Residual Sum of Squares** also known as the

**(SSE) between the predictions βTxi and the observations yi. With the minimization of this function, it is possible to get the optimal parameter estimation of the vector β.**

*Sum of Squared Errors*In matrix notation, the ** RSS** equation is the following:

To get the optimal values of β, it is necessary to derivate ** RSS** respect to β:

Remember that X is a matrix with all the independent variables and has N observations and p features. Therefore, the dimension of X is ** N** (rows) x

**(columns).**

*p+1*One assumption of this model is that the matrix *X_T_X* should be ** positive-define**. This means that the model is valid only when there are more observations than dimensions. In cases of high-dimensional data (e.g. text document classification), this assumption is not true.

Under the assumption of a *positive-definite**X_T_X* the differentiated equation is set to zero and the β parameters are calculated:

Later we will show an example using a dataset of Open, High, Low, Close and Volume of the S&P 500 to fit and evaluate a ** multiple linear regression** algorithm using Scikit learn library.

- Machine Learning with Python
- What is Machine Learning?
- Data Preprocessing in Data Science and Machine Learning
- Feature Selection in Machine Learning
- Train-Test Datasets in Machine Learning
- Evaluate Model Performance - Loss Function
- Model Selection in Machine Learning
- Bias Variance Trade Off
- Supervised Learning Models
- Multiple Linear Regression
- Logistic Regression
- Logistic Regression in Python using scikit-learn Package
- Decision Trees in Machine Learning
- Random Forest Algorithm in Python
- Support Vector Machine Algorithm Explained
- Multivariate Linear Regression in Python with scikit-learn Library
- Classifier Model in Machine Learning Using Python
- Cross Validation to Avoid Overfitting in Machine Learning
- K-Fold Cross Validation Example Using Python scikit-learn
- Unsupervised Learning Models
- K-Means Algorithm Python Example
- Neural Networks Overview

# R Programming Bundle: 25% OFF

**R Programming - Data Science for Finance Bundle**for just $29 $39.