Filtering Rows in a DataFrame in Python

Filtering rows in a pandas DataFrame is a vital operation, allowing you to select specific subsets of data based on certain conditions. This capability is essential for data analysis, as it enables you to focus on specific segments of your dataset that meet your criteria. Whether it's for exploratory data analysis, preparing data for modeling, or generating specific insights, row filtering is a fundamental step in the data manipulation process.

In pandas, row filtering can be done using boolean indexing. This technique involves creating a boolean condition that evaluates to True or False for each row, and then using this condition to select only the rows where the condition is True.

Here are examples of filtering rows based on conditions in your dataset:

Filtering by LoanAmountCategory

  • Filter rows where LoanAmountCategory is 'Large', 'Medium', or 'Small'.

Filtering by LoanDurationDays

  • Filter rows where LoanDurationDays is less than 100.

  • Filter rows where LoanDurationDays is between 100 and 200.

  • Filter rows where LoanDurationDays is more than 200.

Here's how we can implement these filters in code:

# Filtering by LoanAmountCategory
large_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Large']
medium_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Medium']
small_loans = loan_data_cleaned[loan_data_cleaned['LoanAmountCategory'] == 'Small']

# Filtering by LoanDurationDays
short_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays'] < 100]
medium_duration_loans = loan_data_cleaned[(loan_data_cleaned['LoanDurationDays'] >= 100) & (loan_data_cleaned['LoanDurationDays'] <= 200)]
long_duration_loans = loan_data_cleaned[loan_data_cleaned['LoanDurationDays'] > 200]

# Display the filtered data (example)
print("Large Loans:\n", large_loans.head())
print("Short Duration Loans (<100 days):\n", short_duration_loans.head())

DataFrame for large loans:

These examples create new DataFrames (large_loans, medium_loans, small_loans, short_duration_loans, medium_duration_loans, long_duration_loans) containing rows that match the specified conditions. The use of boolean expressions within square brackets is the key to filtering. This approach is both powerful and flexible, enabling you to define complex filtering criteria easily.

Exporting Our Clean Data

After cleaning the data, we can export our cleaned data into a csv file using the following code:

#Save Cleaned data to a csv file.
loan_data_clean = '../data/loan_data_clean.csv'

loan_data_cleaned.to_csv(loan_data_clean, index=False)

Related Downloads

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.