Correlation Analysis using pandas

Correlation analysis is crucial, especially in financial data, to identify relationships between variables. To learn about Correlation Analysis using pandas, we will take a new dataset containing historical data for a single stock containing various indicators such as open, close, high, low, volume, adjusted, symbol, etc. We have this data for Qualcomm in a file named QCOM.csv.

# Load QCOM stock data from a CSV file

# Read the CSV file without parsing dates
qcom_df = pd.read_csv('../data/QCOM.csv', index_col='Date')

# Convert the 'Date' column to datetime
qcom_df.index = pd.to_datetime(qcom_df.index, format='%d/%m/%y')

This code loads stock data for QCOM (Qualcomm Incorporated) from a CSV file into a pandas DataFrame. It initially reads the CSV without parsing the 'Date' column as dates, setting 'Date' as the index. Then, it converts the 'Date' index into datetime objects using a specific date format. Let’s preview the data using the qcom.head() method.

For calculating correlations, we can use the corr() function that computes pairwise correlation of columns, excluding NA/null values. High correlation between 'Open' and 'Close' prices may be expected, but other relationships could signal interesting market dynamics.

# Correlation matrix of prices
qcom_df[['Open', 'Close', 'High', 'Low']].corr()

The correlation matrix shows high correlation between all variables. It suggests that the stock experiences low intraday volatility, with its opening, high, low, and closing prices remaining closely related throughout the trading day.

Let’s load data for one more stock Microsoft for the same period.

# Load MSFT stock data from a CSV file

# Read the CSV file without parsing dates
msft_df = pd.read_csv('../data/MSFT.csv', index_col='Date',parse_dates=['Date'])
msft_df.head()

If you want to calculate the correlation of the 'Close' prices (for instance) between the two stocks, you would do something like this:

# Assuming the indexes are already aligned and the date formats are consistent
correlation_matrix = qcom_df['Close'].corr(msft_df['Close'])
print(correlation_matrix)

The correlation is -0.33. A correlation coefficient of -0.33 between the close prices of MSFT and QCOM indicates a weak inverse or negative relationship between the two stocks' closing prices. In practical terms, this means that on some days when MSFT's stock price goes up, QCOM's stock price tends to go down, and vice versa. However, since the correlation is weak, this relationship is not strong or consistent. It suggests that other factors might be influencing the stock prices, and they do not move strongly in opposite directions relative to each other.

Note: In the Jupyter notebook (member only), we have also included an advanced example that demonstrates how to calculate the correlation matrix between the 'Close' and 'Volume' columns of both qcom_df and msft_df.

Related Downloads

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.