- Pandas - Install Python and Pandas
- Basic Data Structures in Pandas
- Loading and Saving Data using Pandas
- Exploring Data using pandas
- Correlation Analysis using pandas
- Handling Categorical Data and Unique Values using pandas
- Data Visualization using pandas
- Handling Missing Data in Python
- Strategies for Handling Missing Data
- Handling Missing Data - Example - Part 1
- Handling Missing Data - Example - Part 2
- Handling Missing Data - Example - Part 3 (Non-numeric Values)
- Handling Missing Data - Example - Part 4
- Data Transformation and Feature Engineering
- Converting Data Types in Python pandas
- Encoding Categorical Data in Python pandas
- Handling Date and Time Data in Python pandas
- Renaming Columns in Python pandas
- Filtering Rows in a DataFrame in Python
- Merging and Joining Datasets in Python pandas
- Sorting and Indexing Data for Efficient Analysis in Python
Handling Missing Data - Example - Part 3 (Non-numeric Values)
Loan Amount (Revisit)
A cursory look at the Loan Amount column shows that most values are numeric, but some values are non-numeric. For example in one place 2000 is written as Two Thousand.
To handle non-numeric values in the LoanAmount column, such as "Two Thousand", you would typically need to:
- Identify all the non-numeric entries.
- Convert them to a numeric format.
- Replace the original non-numeric entries with their numeric equivalents.
Here's a step-by-step guide on how to write the code to do this:
Identify Non-numeric Entries: Use regular expressions or pd.to_numeric with errors='coerce' to flag non-numeric entries.
Map Non-numeric to Numeric: Create a mapping of words to numbers. For English number words, you can use the word2number package, which can convert number words like "Two Thousand" into numeric values.
Replace Non-numeric Entries: Use apply to replace the non-numeric entries with their numeric equivalents.
Here's a Python code snippet that shows this. First, you would need to install the word2number package if it's not already installed:
conda install -c conda-forge word2number
or
pip install word2number
from word2number import w2n
# Function to convert non-numeric loan amounts to numeric
def convert_to_numeric(value):
try:
# This will convert numeric strings to integers and ignore already numeric values
return pd.to_numeric(value)
except ValueError:
try:
# This will convert written numbers to numeric values
return w2n.word_to_num(value)
except ValueError:
# If conversion fails, return a default value or raise an error
return None
# Apply the function to the 'LoanAmount' column
loan_data_cleaned['LoanAmount'] = loan_data_cleaned['LoanAmount'].apply(convert_to_numeric)
# Check for any None values which indicate failed conversions
failed_conversions = loan_data_cleaned[loan_data_cleaned['LoanAmount'].isnull()]
print("Failed conversions:\n", failed_conversions[['CustomerName', 'LoanAmount']])
Our Loan Amount data now has only numeric values.
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.