Data Cleaning in R - Part 5

Premium

Numeric Features

Let’s look at all numeric features we have left.

1> str(data_train[getNumericColumns(data_train)])
2'data.frame':    41909 obs. of  54 variables:
3 $ funded_amnt               : int  10000 35000 14400 7250 10000 10000 25000 8400 6950 16000 ...
4 $ annual_inc                : num  52000 85000 85000 72000 45000 ...
5 $ dti                       : num  15 24.98 28.11 23.93 8.03 ...
6 $ delinq_2yrs               : int  0 0 0 1 0 0 0 0 0 0 ...
7 $ earliest_cr_line          : num  5630 2647 5873 11382 9436 ...
8 $ inq_last_6mths            : int  1 1 0 0 0 1 1 1 0 1 ...
9 $ mths_since_last_delinq    : num  44 31 72 20 31 31 31 60 55 25 ...
10 $ pub_rec                   : int  2 0 0 1 1 0 0 0 4 0 ...
11 $ revol_bal                 : int  1077 10167 37582 12220 471 10139 47954 11059 7096 9891 ...
12 $ revol_util                : num  19.53 20.75 10.75 13.67 5.32 ...
13 $ collections_12_mths_ex_med: int  0 0 0 0 0 0 0 1 0 0 ...
14 $ acc_now_delinq            : int  0 0 0 0 0 0 0 0 0 0 ...
15 $ tot_coll_amt              : int  622 0 0 0 0 0 0 0 0 2017 ...
16 $ open_acc_6m               : num  2 0 0 1 0 1 2 1 0 0 ...
17 $ open_act_il               : num  1 3 3 2 1 2 2 1 0 2 ...
18 $ open_il_12m               : num  4 0 1 1 0 1 2 1 0 2 ...
19 $ mths_since_rcnt_il        : num  2 14 12 3 23 6 6 10 91 9 ...
20 $ total_bal_il              : num  14809 73863 22387 40343 11499 ...
21 $ il_util                   : num  99 83 47 92 72 73 61 91 76 67 ...
22 $ open_rv_12m               : num  0 0 0 1 1 1 1 2 0 3 ...
23 $ max_bal_bc                : num  1007 5109 12211 3694 325 ...
24 $ all_util                  : num  88 71 66 84 54 55 60 86 59 50 ...
25 $ inq_fi                    : num  3 5 0 0 2 2 0 1 0 1 ...
26 $ total_cu_tl               : num  0 1 0 0 2 0 1 1 0 1 ...
27 $ inq_last_12m              : num  2 2 0 1 0 3 4 3 1 4 ...
28 $ avg_cur_bal               : int  3972 17960 11885 30540 1710 4752 47914 22436 1419 2506 ...
29 $ bc_open_to_buy            : num  1623 4833 3393 997 4329 ...
30 $ chargeoff_within_12_mths  : int  0 0 0 0 0 0 0 0 0 0 ...
31 $ delinq_amnt               : int  0 0 0 0 0 0 0 0 0 0 ...
32 $ mo_sin_old_il_acct        : num  101 87 145 132 135 65 258 129 153 113 ...
33 $ mo_sin_rcnt_rev_tl_op     : int  25 22 26 8 10 10 4 2 17 7 ...
34 $ mo_sin_rcnt_tl            : int  2 14 12 3 10 6 4 2 17 7 ...
35 $ mort_acc                  : int  0 1 6 4 2 0 4 2 8 0 ...
36 $ mths_since_recent_bc      : num  25 22 32 59 10 10 4 89 31 7 ...
37 $ mths_since_recent_inq     : num  4 5 20 9 23 6 4 2 11 0 ...
38 $ num_accts_ever_120_pd     : int  2 0 0 3 0 0 0 2 1 2 ...
39 $ num_actv_bc_tl            : int  2 3 5 3 2 4 3 2 3 4 ...
40 $ num_bc_tl                 : int  3 4 9 4 5 4 7 7 7 5 ...
41 $ num_il_tl                 : int  7 9 7 8 4 4 8 2 3 7 ...
42 $ num_tl_90g_dpd_24m        : int  0 0 0 1 0 0 0 0 0 0 ...
43 $ pct_tl_nvr_dlq            : num  83.3 100 93.9 83.3 100 100 100 86.4 91.3 93.3 ...
44 $ percent_bc_gt_75          : num  50 33.3 100 100 0 0 0 100 33.3 25 ...
45 $ pub_rec_bankruptcies      : int  0 0 0 0 1 0 0 0 3 0 ...
46 $ tax_liens                 : int  0 0 0 1 0 0 0 0 1 0 ...
47 $ is_ny                     : num  0 1 0 0 0 0 0 0 0 0 ...
48 $ is_pa                     : num  0 0 0 0 0 0 0 0 0 0 ...
49 $ is_nj                     : num  0 0 0 0 0 0 0 0 0 0 ...
50 $ is_oh                     : num  0 0 0 0 0 0 0 0 0 0 ...
51 $ is_fl                     : num  0 0 0 0 1 0 0 0 1 0 ...
52 $ is_co                     : num  0 0 0 0 0 0 0 0 0 0 ...
53 $ is_ga                     : num  1 0 0 0 0 0 0 1 0 0 ...
54 $ is_va                     : num  0 0 0 0 0 0 0 0 0 0 ...
55 $ is_az                     : num  0 0 0 0 0 0 0 0 0 0 ...
56 $ is_ca                     : num  0 0 0 0 0 0 0 0 0 0 ...
57>
58

Unlock Premium Content

Upgrade your account to access the full article, downloads, and exercises.

You'll get access to:

  • Access complete tutorials and examples
  • Download source code and resources
  • Follow along with practical exercises
  • Get in-depth explanations