Data Cleaning in R - Part 5
Premium
Numeric Features
Let’s look at all numeric features we have left.
1> str(data_train[getNumericColumns(data_train)])
2'data.frame': 41909 obs. of 54 variables:
3 $ funded_amnt : int 10000 35000 14400 7250 10000 10000 25000 8400 6950 16000 ...
4 $ annual_inc : num 52000 85000 85000 72000 45000 ...
5 $ dti : num 15 24.98 28.11 23.93 8.03 ...
6 $ delinq_2yrs : int 0 0 0 1 0 0 0 0 0 0 ...
7 $ earliest_cr_line : num 5630 2647 5873 11382 9436 ...
8 $ inq_last_6mths : int 1 1 0 0 0 1 1 1 0 1 ...
9 $ mths_since_last_delinq : num 44 31 72 20 31 31 31 60 55 25 ...
10 $ pub_rec : int 2 0 0 1 1 0 0 0 4 0 ...
11 $ revol_bal : int 1077 10167 37582 12220 471 10139 47954 11059 7096 9891 ...
12 $ revol_util : num 19.53 20.75 10.75 13.67 5.32 ...
13 $ collections_12_mths_ex_med: int 0 0 0 0 0 0 0 1 0 0 ...
14 $ acc_now_delinq : int 0 0 0 0 0 0 0 0 0 0 ...
15 $ tot_coll_amt : int 622 0 0 0 0 0 0 0 0 2017 ...
16 $ open_acc_6m : num 2 0 0 1 0 1 2 1 0 0 ...
17 $ open_act_il : num 1 3 3 2 1 2 2 1 0 2 ...
18 $ open_il_12m : num 4 0 1 1 0 1 2 1 0 2 ...
19 $ mths_since_rcnt_il : num 2 14 12 3 23 6 6 10 91 9 ...
20 $ total_bal_il : num 14809 73863 22387 40343 11499 ...
21 $ il_util : num 99 83 47 92 72 73 61 91 76 67 ...
22 $ open_rv_12m : num 0 0 0 1 1 1 1 2 0 3 ...
23 $ max_bal_bc : num 1007 5109 12211 3694 325 ...
24 $ all_util : num 88 71 66 84 54 55 60 86 59 50 ...
25 $ inq_fi : num 3 5 0 0 2 2 0 1 0 1 ...
26 $ total_cu_tl : num 0 1 0 0 2 0 1 1 0 1 ...
27 $ inq_last_12m : num 2 2 0 1 0 3 4 3 1 4 ...
28 $ avg_cur_bal : int 3972 17960 11885 30540 1710 4752 47914 22436 1419 2506 ...
29 $ bc_open_to_buy : num 1623 4833 3393 997 4329 ...
30 $ chargeoff_within_12_mths : int 0 0 0 0 0 0 0 0 0 0 ...
31 $ delinq_amnt : int 0 0 0 0 0 0 0 0 0 0 ...
32 $ mo_sin_old_il_acct : num 101 87 145 132 135 65 258 129 153 113 ...
33 $ mo_sin_rcnt_rev_tl_op : int 25 22 26 8 10 10 4 2 17 7 ...
34 $ mo_sin_rcnt_tl : int 2 14 12 3 10 6 4 2 17 7 ...
35 $ mort_acc : int 0 1 6 4 2 0 4 2 8 0 ...
36 $ mths_since_recent_bc : num 25 22 32 59 10 10 4 89 31 7 ...
37 $ mths_since_recent_inq : num 4 5 20 9 23 6 4 2 11 0 ...
38 $ num_accts_ever_120_pd : int 2 0 0 3 0 0 0 2 1 2 ...
39 $ num_actv_bc_tl : int 2 3 5 3 2 4 3 2 3 4 ...
40 $ num_bc_tl : int 3 4 9 4 5 4 7 7 7 5 ...
41 $ num_il_tl : int 7 9 7 8 4 4 8 2 3 7 ...
42 $ num_tl_90g_dpd_24m : int 0 0 0 1 0 0 0 0 0 0 ...
43 $ pct_tl_nvr_dlq : num 83.3 100 93.9 83.3 100 100 100 86.4 91.3 93.3 ...
44 $ percent_bc_gt_75 : num 50 33.3 100 100 0 0 0 100 33.3 25 ...
45 $ pub_rec_bankruptcies : int 0 0 0 0 1 0 0 0 3 0 ...
46 $ tax_liens : int 0 0 0 1 0 0 0 0 1 0 ...
47 $ is_ny : num 0 1 0 0 0 0 0 0 0 0 ...
48 $ is_pa : num 0 0 0 0 0 0 0 0 0 0 ...
49 $ is_nj : num 0 0 0 0 0 0 0 0 0 0 ...
50 $ is_oh : num 0 0 0 0 0 0 0 0 0 0 ...
51 $ is_fl : num 0 0 0 0 1 0 0 0 1 0 ...
52 $ is_co : num 0 0 0 0 0 0 0 0 0 0 ...
53 $ is_ga : num 1 0 0 0 0 0 0 1 0 0 ...
54 $ is_va : num 0 0 0 0 0 0 0 0 0 0 ...
55 $ is_az : num 0 0 0 0 0 0 0 0 0 0 ...
56 $ is_ca : num 0 0 0 0 0 0 0 0 0 0 ...
57>
58
Unlock Premium Content
Upgrade your account to access the full article, downloads, and exercises.
You'll get access to:
- Access complete tutorials and examples
- Download source code and resources
- Follow along with practical exercises
- Get in-depth explanations