Converting Data Types in Python pandas

Appropriate data types are crucial for efficient memory usage and compatibility with analysis tools. Let’s first check the data types of each column in our loans data frame. To check the data types of each field in a pandas DataFrame in Python, you can use the .dtypes attribute. This attribute returns a Series with the data type of each column.

print(loan_data_cleaned.dtypes)

Once you have the data types, you can follow these general rules to decide on the correct data type for each column and how to fix them:

  • Numeric Data: Should typically be int for whole numbers or float for numbers with decimals. Use pd.to_numeric() to convert columns to a numeric data type.

  • Dates: Should be in datetime format for easier manipulation of date-related operations. Use pd.to_datetime() to convert date columns, specifying the date format if necessary.

  • Categorical Data: If a column has a limited set of values that repeat (like LoanAmountCategory or LoanStatus), you can convert it to category type using .astype('category') to save memory.

  • Boolean Data: Should be bool if it contains only two values representing True/False conditions.

  • String Data: If the data is textual, ensure it is of object type, which is the default for strings in pandas. Use .astype(str) to convert a column to strings. The CustomerName column should contain strings, so it should be of object type.

In our data, we will perform the following data type transformations:

  1. Convert LoanStatus to category type

  2. Convert LoanAmountCategory to category type

  3. Convert CustomerLoyalty to category type

  4. Convert LoanDurationDays from float to int

  5. Convert Total Loans by customers from float to int

You can convert the data types of columns in a pandas DataFrame using the astype method. For the numeric columns that you want to convert from float to int, you'll need to ensure there are no missing values because NaN (not a number) is a float value and cannot exist in an integer column. Since we’ve already handled all NaNs in our data, we can convert the data types for the specified columns:

# Convert 'LoanStatus', 'LoanAmountCategory', and 'CustomerLoyalty' to category type
loan_data_cleaned['LoanStatus'] = loan_data_cleaned['LoanStatus'].astype('category')
loan_data_cleaned['LoanAmountCategory'] = loan_data_cleaned['LoanAmountCategory'].astype('category')
loan_data_cleaned['CustomerLoyalty'] = loan_data_cleaned['CustomerLoyalty'].astype('category')

# Convert 'LoanDurationDays' from float to int (assuming no NaN values are present)
loan_data_cleaned['LoanDurationDays'] = loan_data_cleaned['LoanDurationDays'].astype(int)

# Convert 'TotalLoansByCustomer' from float to int (assuming no NaN values are present)
loan_data_cleaned['TotalLoansByCustomer'] = loan_data_cleaned['TotalLoansByCustomer'].astype(int)

# Verify the changes
loan_data_cleaned.dtypes

We now have a pretty good and clean dataset.

Related Downloads

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $29 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.