Data Cleaning in R - Part 3

Premium

Default by States

We take a look at default rate for each state. We filter out states that have too small number of loans(less than 1000):

1tmp = data_train %>% filter(loan_status=="Default") %>% group_by(addr_state) %>% summarise(default_count = n())
2tmp2 = data_train %>% group_by(addr_state) %>% summarise(count = n())
3tmp3 = tmp2 %>% left_join(tmp) %>% mutate(default_rate = default_count/count)
4

Unlock Premium Content

Upgrade your account to access the full article, downloads, and exercises.

You'll get access to:

  • Access complete tutorials and examples
  • Download source code and resources
  • Follow along with practical exercises
  • Get in-depth explanations