Now, let’s work on the Loan Amount Category column. This also has a few missing values. We’re going to fill these missing values using this logic: If the Loan Amount is 1000 and below, we will fill ‘Small’. If it is 2000 and below but more than 1000, its ‘Medium’. If it is above 2000, it is ‘Large’.
To fill the missing values in the Loan Amount Category column based on the value of Loan Amount, you can use the apply function along with a lambda function to check the conditions and assign the appropriate category. Here is the code to do that:
1# Define a function to categorize 'LoanAmount'2defcategorize_loan_amount(amount):3if amount <=1000:4return'Small'5elif amount <=2000:6return'Medium'7else:# This means the amount is above 20008return'Large'910# Apply the function to fill missing 'LoanAmountCategory'11loan_data_cleaned['LoanAmountCategory']= loan_data_cleaned.apply(12lambda row: categorize_loan_amount(row['LoanAmount'])if pd.isnull(row['LoanAmountCategory'])else row['LoanAmountCategory'],13 axis=114)15loan_data_cleaned.head()1617
If the code runs successfully, it will fill the missing categories as per our logic.
We have two more columns to work on – Total Loans By Customer and Customer Loyalty.
Total Loans By Customer
This numeric field represents the number of loans taken by the customer. Missing values can be filled with the mean or median, but if the distribution is skewed or if a significant number of customers have only one loan, using the median or even a default value like 1 might be more appropriate. Let’s fill it with median.
1# Fill missing 'TotalLoansByCustomer' with the median2median_total_loans = loan_data_cleaned['TotalLoansByCustomer'].median()3loan_data_cleaned['TotalLoansByCustomer'].fillna(median_total_loans, inplace=True)4loan_data_cleaned
56
Customer Loyalty
There are only two categories: Returning, or New. If a user has only 1 loan, fill ‘New’. If it has more than 1 loan, fill ‘Returning’.
With this, we’ve filled all the missing values, and along the way also performed some other interesting transformations.
There’s one more small thing I see that I would like to fix before proceeding. The customer names have different capitalizations. Some are small case, some are uppercase, while the others are in title case. Let’s convert all of them to title case.
To convert all customer names in your DataFrame to title case, you can use the str.title() method available on pandas Series objects. Here's the code to do that:
1# Convert 'CustomerName' to title case2loan_data_cleaned['CustomerName']= loan_data_cleaned['CustomerName'].str.title()3loan_data_cleaned.head()45
This fixes it.
We will now head to the next important part – Data Transformation and Feature Engineering.
Unlock Premium Content
Upgrade your account to access the full article, downloads, and exercises.