Handling Missing Data - Example - Part 3 (Non-numeric Values)

Loan Amount (Revisit)

A cursory look at the Loan Amount column shows that most values are numeric, but some values are non-numeric. For example in one place 2000 is written as Two Thousand.

To handle non-numeric values in the LoanAmount column, such as "Two Thousand", you would typically need to:

  • Identify all the non-numeric entries.
  • Convert them to a numeric format.
  • Replace the original non-numeric entries with their numeric equivalents.

Here's a step-by-step guide on how to write the code to do this:

  1. Identify Non-numeric Entries: Use regular expressions or pd.to_numeric with errors='coerce' to flag non-numeric entries.

  2. Map Non-numeric to Numeric: Create a mapping of words to numbers. For English number words, you can use the word2number package, which can convert number words like "Two Thousand" into numeric values.

  3. Replace Non-numeric Entries: Use apply to replace the non-numeric entries with their numeric equivalents.

Here's a Python code snippet that shows this. First, you would need to install the word2number package if it's not already installed:

conda install -c conda-forge word2number
pip install word2number

from word2number import w2n

# Function to convert non-numeric loan amounts to numeric
def convert_to_numeric(value):
        # This will convert numeric strings to integers and ignore already numeric values
        return pd.to_numeric(value)
    except ValueError:
            # This will convert written numbers to numeric values
            return w2n.word_to_num(value)
        except ValueError:
            # If conversion fails, return a default value or raise an error
            return None

# Apply the function to the 'LoanAmount' column
loan_data_cleaned['LoanAmount'] = loan_data_cleaned['LoanAmount'].apply(convert_to_numeric)

# Check for any None values which indicate failed conversions
failed_conversions = loan_data_cleaned[loan_data_cleaned['LoanAmount'].isnull()]
print("Failed conversions:\n", failed_conversions[['CustomerName', 'LoanAmount']])

Our Loan Amount data now has only numeric values.

Related Downloads

Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.