Handling Missing Data - Example - Part 3 (Non-numeric Values)

Loan Amount (Revisit)

A cursory look at the Loan Amount column shows that most values are numeric, but some values are non-numeric. For example in one place 2000 is written as Two Thousand.

To handle non-numeric values in the LoanAmount column, such as "Two Thousand", you would typically need to:

  • Identify all the non-numeric entries.
  • Convert them to a numeric format.
  • Replace the original non-numeric entries with their numeric equivalents.

Here's a step-by-step guide on how to write the code to do this:

  1. Identify Non-numeric Entries: Use regular expressions or pd.to_numeric with errors='coerce' to flag non-numeric entries.

  2. Map Non-numeric to Numeric: Create a mapping of words to numbers. For English number words, you can use the word2number package, which can convert number words like "Two Thousand" into numeric values.

  3. Replace Non-numeric Entries: Use apply to replace the non-numeric entries with their numeric equivalents.

Here's a Python code snippet that shows this. First, you would need to install the word2number package if it's not already installed:

conda install -c conda-forge word2number
or 
pip install word2number

from word2number import w2n

# Function to convert non-numeric loan amounts to numeric
def convert_to_numeric(value):
    try:
        # This will convert numeric strings to integers and ignore already numeric values
        return pd.to_numeric(value)
    except ValueError:
        try:
            # This will convert written numbers to numeric values
            return w2n.word_to_num(value)
        except ValueError:
            # If conversion fails, return a default value or raise an error
            return None

# Apply the function to the 'LoanAmount' column
loan_data_cleaned['LoanAmount'] = loan_data_cleaned['LoanAmount'].apply(convert_to_numeric)

# Check for any None values which indicate failed conversions
failed_conversions = loan_data_cleaned[loan_data_cleaned['LoanAmount'].isnull()]
print("Failed conversions:\n", failed_conversions[['CustomerName', 'LoanAmount']])

Our Loan Amount data now has only numeric values.

Related Downloads

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $29 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.