Filtering rows in a pandas DataFrame is a vital operation, allowing you to select specific subsets of data based on certain conditions. This capability is essential for data analysis, as it enables you to focus on specific segments of your dataset that meet your criteria. Whether it's for exploratory data analysis, preparing data for modeling, or generating specific insights, row filtering is a fundamental step in the data manipulation process.
In pandas, row filtering can be done using boolean indexing. This technique involves creating a boolean condition that evaluates to True or False for each row, and then using this condition to select only the rows where the condition is True.
Here are examples of filtering rows based on conditions in your dataset:
Filtering by LoanAmountCategory
Filter rows where LoanAmountCategory is 'Large', 'Medium', or 'Small'.
Filtering by LoanDurationDays
Filter rows where LoanDurationDays is less than 100.
Filter rows where LoanDurationDays is between 100 and 200.
Filter rows where LoanDurationDays is more than 200.
Here's how we can implement these filters in code:
1# Filtering by LoanAmountCategory2large_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory']=='Large']3medium_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory']=='Medium']4small_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory']=='Small']56# Filtering by LoanDurationDays7short_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays']<100]8medium_duration_loans = loan_data_cleaned[(loan_data_cleaned['LoanDurationDays']>=100)&(loan_data_cleaned['LoanDurationDays']<=200)]9long_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays']>200]1011# Display the filtered data (example)12print("Large Loans:\n", large_loans.head())13print("Short Duration Loans (<100 days):\n", short_duration_loans.head())1415
DataFrame for large loans:
These examples create new DataFrames (large_loans, medium_loans, small_loans, short_duration_loans, medium_duration_loans, long_duration_loans) containing rows that match the specified conditions. The use of boolean expressions within square brackets is the key to filtering. This approach is both powerful and flexible, enabling you to define complex filtering criteria easily.
Exporting Our Clean Data
After cleaning the data, we can export our cleaned data into a csv file using the following code:
1#Save Cleaned data to a csv file.2loan_data_clean ='../data/loan_data_clean.csv'34loan_data_cleaned.to_csv(loan_data_clean, index=False)56
Unlock Premium Content
Upgrade your account to access the full article, downloads, and exercises.