Data Cleaning in R - Part 1

Discarding Attributes

LendingClub also provides a data dictionary that contains details of all attributes of out dataset. We can use that dictionary to understand more about the data columns we have and remove columns that may not impact the loan default.


Discard Attributes

We can use the data dictionary to identify and discard some attributes which we think are irrelevant or will have little impact on loan default.

discard_column = c("collection_recovery_fee","emp_title",
> data_train = (data_train[,!(names(data_train) %in% discard_column)])
> dim(data_train)
[1] 41909   122

Discard Grade Attribute

We will also drop the grade attribute as the grade information is also available in sub_grade.

