- Machine Learning with Python
- What is Machine Learning?
- Data Preprocessing in Data Science and Machine Learning
- Feature Selection in Machine Learning
- Train-Test Datasets in Machine Learning
- Evaluate Model Performance - Loss Function
- Model Selection in Machine Learning
- Bias Variance Trade Off
- Supervised Learning Models
- Multiple Linear Regression
- Logistic Regression
- Logistic Regression in Python using scikit-learn Package
- Decision Trees in Machine Learning
- Random Forest Algorithm in Python
- Support Vector Machine Algorithm Explained
- Multivariate Linear Regression in Python with scikit-learn Library
- Classifier Model in Machine Learning Using Python
- Cross Validation to Avoid Overfitting in Machine Learning
- K-Fold Cross Validation Example Using Python scikit-learn
- Unsupervised Learning Models
- K-Means Algorithm Python Example
- Neural Networks Overview
Multiple Linear Regression
The multiple linear regression algorithm states that a response y can be estimated with a set of input features x and an error term ɛ. The model can be expressed with the following mathematical equation:
βT_X_ is the matrix notation of the equation, where βT, X ϵ ʀp+1 and ɛ ~ N(μ,σ2)
βT(transpose of β) and X are both real-valued vectors with dimension p+1 and ɛ is the residual term which represents the difference between the predictions of the model and the true observation of the variable y.
The vector βT = (β0,β1,…βP) stores all the beta coefficients of the model. These coefficients measure how a change on some of the independent variable impact on the dependent or target variable.
The vector X = (1,x1,x2, …xp) hold all the values of the independent variables. Both vectors (T and X) are p+1 dimensional because of the need to include an intercept term.
The goal of the linear regression model is to minimize the difference between the predictions and the real observations of the target variable. For this purpose, a method called Ordinal Least Squares (OLS) is used which will derive the optimal set of coefficients for fitting the model.
Ordinal Least Squares
Formally the OLS model will minimize the Residual Sum of Squares (RSS) between the observations of the target variable and the predictions of the model. The RSS is the loss function metric to assess model performance in the linear regression model and has the following formulation:
This content is for paid members only.
Join our membership for lifelong unlimited access to all our data science learning content and resources.