Correlation Analysis using pandas

Correlation analysis is crucial, especially in financial data, to identify relationships between variables. To learn about Correlation Analysis using pandas, we will take a new dataset containing historical data for a single stock containing various indicators such as open, close, high, low, volume, adjusted, symbol, etc. We have this data for Qualcomm in a file named QCOM.csv.

1# Load QCOM stock data from a CSV file
2
3# Read the CSV file without parsing dates
4qcom_df = pd.read_csv('../data/QCOM.csv', index_col='Date')
5
6# Convert the 'Date' column to datetime
7qcom_df.index = pd.to_datetime(qcom_df.index, format='%d/%m/%y')
8
9

This code loads stock data for QCOM (Qualcomm Incorporated) from a CSV file into a pandas DataFrame. It initially reads the CSV without parsing the 'Date' column as dates, setting 'Date' as the index. Then, it converts the 'Date' index into datetime objects using a specific date format. Let’s preview the data using the qcom.head() method.

For calculating correlations, we can use the corr() function that computes pairwise correlation of columns, excluding NA/null values. High correlation between 'Open' and 'Close' prices may be expected, but other relationships could signal interesting market dynamics.

1# Correlation matrix of prices
2qcom_df[['Open', 'Close', 'High', 'Low']].corr()
3

The correlation matrix shows high correlation between all variables. It suggests that the stock experiences low intraday volatility, with its opening, high, low, and closing prices remaining closely related throughout the trading day.

Let’s load data for one more stock Microsoft for the same period.

1# Load MSFT stock data from a CSV file
2
3# Read the CSV file without parsing dates
4msft_df = pd.read_csv('../data/MSFT.csv', index_col='Date',parse_dates=['Date'])
5msft_df.head()
6
7

If you want to calculate the correlation of the 'Close' prices (for instance) between the two stocks, you would do something like this:

1# Assuming the indexes are already aligned and the date formats are consistent
2correlation_matrix = qcom_df['Close'].corr(msft_df['Close'])
3print(correlation_matrix)
4

The correlation is -0.33. A correlation coefficient of -0.33 between the close prices of MSFT and QCOM indicates a weak inverse or negative relationship between the two stocks' closing prices. In practical terms, this means that on some days when MSFT's stock price goes up, QCOM's stock price tends to go down, and vice versa. However, since the correlation is weak, this relationship is not strong or consistent. It suggests that other factors might be influencing the stock prices, and they do not move strongly in opposite directions relative to each other.

Note: In the Jupyter notebook (member only), we have also included an advanced example that demonstrates how to calculate the correlation matrix between the 'Close' and 'Volume' columns of both qcom_df and msft_df.

Learn

Resources

Correlation Analysis using pandas

Exploring Data using pandas

Handling Categorical Data and Unique Values using pandas

Data Manipulation Using Pandas - Part 1

Data Science for Finance Bundle

Topics