Filtering Rows in a DataFrame in Python

Filtering rows in a pandas DataFrame is a vital operation, allowing you to select specific subsets of data based on certain conditions. This capability is essential for data analysis, as it enables you to focus on specific segments of your dataset that meet your criteria. Whether it's for exploratory data analysis, preparing data for modeling, or generating specific insights, row filtering is a fundamental step in the data manipulation process.

In pandas, row filtering can be done using boolean indexing. This technique involves creating a boolean condition that evaluates to True or False for each row, and then using this condition to select only the rows where the condition is True.

Here are examples of filtering rows based on conditions in your dataset:

Filtering by LoanAmountCategory

  • Filter rows where LoanAmountCategory is 'Large', 'Medium', or 'Small'.

Filtering by LoanDurationDays

  • Filter rows where LoanDurationDays is less than 100.

  • Filter rows where LoanDurationDays is between 100 and 200.

  • Filter rows where LoanDurationDays is more than 200.

Here's how we can implement these filters in code:

# Filtering by LoanAmountCategory
large_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Large']
medium_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Medium']
small_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Small']

# Filtering by LoanDurationDays
short_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays'] < 100]
medium_duration_loans = loan_data_cleaned[(loan_data_cleaned['LoanDurationDays'] >= 100) & (loan_data_cleaned['LoanDurationDays'] <= 200)]
long_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays'] > 200]

# Display the filtered data (example)
print("Large Loans:\n", large_loans.head())
print("Short Duration Loans (<100 days):\n", short_duration_loans.head())

DataFrame for large loans:

These examples create new DataFrames (large_loans, medium_loans, small_loans, short_duration_loans, medium_duration_loans, long_duration_loans) containing rows that match the specified conditions. The use of boolean expressions within square brackets is the key to filtering. This approach is both powerful and flexible, enabling you to define complex filtering criteria easily.

Exporting Our Clean Data

After cleaning the data, we can export our cleaned data into a csv file using the following code:

#Save Cleaned data to a csv file.
loan_data_clean = '../data/loan_data_clean.csv'

loan_data_cleaned.to_csv(loan_data_clean, index=False)

Related Downloads

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $39 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.