Filtering Rows in a DataFrame in Python

Premium

Filtering rows in a pandas DataFrame is a vital operation, allowing you to select specific subsets of data based on certain conditions. This capability is essential for data analysis, as it enables you to focus on specific segments of your dataset that meet your criteria. Whether it's for exploratory data analysis, preparing data for modeling, or generating specific insights, row filtering is a fundamental step in the data manipulation process.

In pandas, row filtering can be done using boolean indexing. This technique involves creating a boolean condition that evaluates to True or False for each row, and then using this condition to select only the rows where the condition is True.

Here are examples of filtering rows based on conditions in your dataset:

Filtering by LoanAmountCategory

  • Filter rows where LoanAmountCategory is 'Large', 'Medium', or 'Small'.

Filtering by LoanDurationDays

  • Filter rows where LoanDurationDays is less than 100.

  • Filter rows where LoanDurationDays is between 100 and 200.

  • Filter rows where LoanDurationDays is more than 200.

Here's how we can implement these filters in code:

1# Filtering by LoanAmountCategory
2large_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Large']
3medium_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Medium']
4small_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Small']
5
6# Filtering by LoanDurationDays
7short_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays'] < 100]
8medium_duration_loans = loan_data_cleaned[(loan_data_cleaned['LoanDurationDays'] >= 100) & (loan_data_cleaned['LoanDurationDays'] <= 200)]
9long_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays'] > 200]
10
11# Display the filtered data (example)
12print("Large Loans:\n", large_loans.head())
13print("Short Duration Loans (<100 days):\n", short_duration_loans.head())
14
15

DataFrame for large loans:

Loan Data - Filtered Rows
Loan Data - Filtered Rows

These examples create new DataFrames (large_loans, medium_loans, small_loans, short_duration_loans, medium_duration_loans, long_duration_loans) containing rows that match the specified conditions. The use of boolean expressions within square brackets is the key to filtering. This approach is both powerful and flexible, enabling you to define complex filtering criteria easily.

Exporting Our Clean Data

After cleaning the data, we can export our cleaned data into a csv file using the following code:

1#Save Cleaned data to a csv file.
2loan_data_clean = '../data/loan_data_clean.csv'
3
4loan_data_cleaned.to_csv(loan_data_clean, index=False)
5
6