The predictions that the model returns will be compared with the real observation of the variable to obtain a measure of model performance. In order to evaluate the model performance, both classification and regression problems need to define a function called loss function that will compute the error of the model.
This loss function is a measure of the accuracy of the model as it calculates the differences between the true value of the response y and the estimation from the model y.
The loss function is different depending on whether we are working on a classification problem or a regression problem. In the classification setting the common loss functions are the 0-1 loss and the cross entropy. 0-1 loss Is a very basic loss function that assigns 1 to correct predictions and 0 to incorrect predictions. Measure the performance of a classification model whose output is a probability between 0 and 1. It does not care about how the errors are made. The Cross Entropy increases as the predicted value diverge from the actual label. On the other hand, in the regression problem, a common loss function is the Mean Squared Error (MSE). We will explain these metrics in the Model Selection section.
Loss Function Interpretation
The accuracy of the model is higher when the loss function is at a minimum, that is, when the difference between the true values and the estimated values is small. There are many factors that account for the minimization of the loss function such as the quality of the data, the amount of features used to train the model as well as the size of the data used.
Researchers and machine learning engineers will work with the model in order to minimize the loss function. However, if the loss function is minimized too severely, the model can get good results with the training data, but could fail in their performance to predict new data.
The above issue is generated when the model is “overfitted” to the data used in the training phase but has not learned how to generalize to new, unseen data. In the machine learning field, this situation is called the overfitting.
Overfitting happens when a model captures the noise of the underlying pattern in data. These models have low bias and high variance. Bias is the difference between the average prediction of the model and the correct value which we are trying to predict. A model with high bias pays very little attention to the training data and oversimplifies the model.
Variance is the variability of model prediction for a data point. A model with high variance pays a lot of attention to training data and is not good to generalize on data which it hasn’t seen before. The results is that the model performs very well on training data but has high error rates for new data.Both bias and variance lead to a common situation in machine learning that is called the bias variance trade off.