Loading and Saving Data using Pandas

Loading Data

In data analysis, the first step is often to load data from various sources. Pandas offers a range of functions to facilitate this process, supporting formats such as CSV, Excel, SQL, JSON, and more. For data analysis, data can often come from CSV files, APIs, or databases.

Reading CSV Files

CSV (Comma-Separated Values) files are one of the most common data formats in data analysis.

In our data folder, we have a file named 'stocks.csv'. It contains daily stock prices data for 5 different stocks. We will load it into our Jupyter session using pandas' pd.read_csv method.

1# Load stock data from a CSV file
2stocks_df = pd.read_csv('../data/stocks.csv', parse_dates=['Date'], index_col='Date')
3
4

In this line of code, stocks.csv is the name of the file we’re reading from. The parse_dates parameter is used to specify which column should be parsed as a date, while index_col is used to set the date column as the index of the DataFrame, providing a time-series structure to the data.

We can now display this dataframe. The head() method in pandas displays the first few rows of a DataFrame. By default, it shows the first five rows, but you can specify a different number if needed within the brackets, like .head(10).

Reading from Excel

Excel files are another common source of data, particularly in finance. Pandas can read data from Excel with the read_excel function. We have some sample data containing a company’s financials for the past 10 years. This data is in an excel file. The below code shows how to load this data as a dataframe.

1# Load financial data from an Excel file
2financials_df = pd.read_excel('../data/company_financials_10_years.xlsx', sheet_name='Financials', parse_dates=['Date'])
3

In this example, we are loading data from an Excel file named 'company_financials_10_years.xlsx'. The specific sheet we're accessing in this file is titled 'Financials'. By using the parse_dates parameter with pandas, we instruct it to treat the 'Date' column as a series of date objects, ensuring proper handling of dates within our data. We can now see the loaded dataframe.

There are many advanced concepts while reading data from excel sheets, such as loading data from multiple sheets, loading data from sheets with complex data structures. We’ve provided some sample code below, however, we will practice these in the practice exercises.

Dealing with Multiple Sheets

Excel files often contain multiple sheets with related data. Pandas can handle this by either specifying a particular sheet name or by loading all sheets at once.

1# Load a specific sheet
2january_data = pd.read_excel('monthly_data.xlsx', sheet_name='January')
3
4# Load all sheets into a dictionary of DataFrames
5all_months_data = pd.read_excel('monthly_data.xlsx', sheet_name=None)
6
7

When sheet_name=None, pandas returns a dictionary where each key is a sheet name and each value is a DataFrame corresponding to a sheet.

Handling Complex Data Structures

Excel files can sometimes contain complex layouts, such as merged cells or headers spanning multiple rows. Pandas offers parameters like header, skiprows, and usecols to deal with such complexities. Some examples:

1# Load a sheet with a header that spans two rows
2complex_header_df = pd.read_excel('complex_layout.xlsx', header=[0, 1])
3
4# Load a sheet while skipping the first two rows and using specific columns
5specified_data = pd.read_excel('data_with_notes.xlsx', skiprows=2, usecols='A:C')
6

With these capabilities, you can customize how Pandas interprets the incoming data, ensuring that the DataFrame structure reflects the data's true layout and meaning. Note: The examples of multiple sheets and complex data structures are not presented in the Jupyter notebook. With our stock data and financial data loaded into Pandas, we'll now dive into the first steps of data analysis: exploration and summarization.

Unlock Premium Content

Upgrade your account to access the full article, downloads, and exercises.

You'll get access to:

Access complete tutorials and examples
Download source code and resources
Follow along with practical exercises
Get in-depth explanations

Learn

Resources

Loading and Saving Data using Pandas

Loading Data

Reading CSV Files

Reading from Excel

Dealing with Multiple Sheets

Handling Complex Data Structures

Unlock Premium Content

You'll get access to:

Basic Data Structures in Pandas

Exploring Data using pandas

Topics

Course Downloads

Course Resources

eBook - Data Manipulation Using Pandas

Data and Jupyter Notebooks