-
Identify all the non-numeric entries.
-
Convert them to a numeric format.
-
Replace the original non-numeric entries with their numeric
equivalents.
Here's a step-by-step guide on how to write the code to do this:
-
Identify Non-numeric Entries: Use regular expressions or pd.to_numeric with errors='coerce' to flag non-numeric entries.
-
Map Non-numeric to Numeric: Create a mapping of words to numbers. For English number words, you can use the word2number package, which can convert number words like "Two Thousand" into numeric values.
-
Replace Non-numeric Entries: Use apply to replace the non-numeric entries with their numeric equivalents.
Here's a Python code snippet that shows this. First, you would need to install the word2number package if it's not already installed:
1conda install -c conda-forge word2number
2or
3pip install word2number
4
5
1from word2number import w2n
2
3# Function to convert non-numeric loan amounts to numeric
4def convert_to_numeric(value):
5 try:
6 # This will convert numeric strings to integers and ignore already numeric values
7 return pd.to_numeric(value)
8 except ValueError:
9 try:
10 # This will convert written numbers to numeric values
11 return w2n.word_to_num(value)
12 except ValueError:
13 # If conversion fails, return a default value or raise an error
14 return None
15
16# Apply the function to the 'LoanAmount' column
17loan_data_cleaned['LoanAmount'] = loan_data_cleaned['LoanAmount'].apply(convert_to_numeric)
18
19# Check for any None values which indicate failed conversions
20failed_conversions = loan_data_cleaned[loan_data_cleaned['LoanAmount'].isnull()]
21print("Failed conversions:\n", failed_conversions[['CustomerName', 'LoanAmount']])
22
23
Our Loan Amount data now has only numeric values.