This lesson requires a premium membership to access.
Premium membership includes unlimited access to all courses, quizzes, downloadable resources, and future content updates.
The purpose of preprocessing is to make your raw data suitable for the data science algorithms. For example, we may want to remove the outliers, remove or change imputations (missing values, and so on).
The dataset that we have selected does not have any missing data. But, in real time there is possibility that the dataset has many missing or imputed data which needs to be replaced with valid data generated by making use of the available complete data. The k-nearest neighbours algorithm is used for this purpose to perform multiple imputation.