- Pandas - Install Python and Pandas
- Basic Data Structures in Pandas
- Loading and Saving Data using Pandas
- Exploring Data using pandas
- Correlation Analysis using pandas
- Handling Categorical Data and Unique Values using pandas
- Data Visualization using pandas
- Handling Missing Data in Python
- Strategies for Handling Missing Data
- Handling Missing Data - Example - Part 1
- Handling Missing Data - Example - Part 2
- Handling Missing Data - Example - Part 3 (Non-numeric Values)
- Handling Missing Data - Example - Part 4
- Data Transformation and Feature Engineering
- Converting Data Types in Python pandas
- Encoding Categorical Data in Python pandas
- Handling Date and Time Data in Python pandas
- Renaming Columns in Python pandas
- Filtering Rows in a DataFrame in Python
- Merging and Joining Datasets in Python pandas
- Sorting and Indexing Data for Efficient Analysis in Python
Correlation Analysis using pandas
Correlation analysis is crucial, especially in financial data, to identify relationships between variables. To learn about Correlation Analysis using pandas, we will take a new dataset containing historical data for a single stock containing various indicators such as open, close, high, low, volume, adjusted, symbol, etc. We have this data for Qualcomm in a file named QCOM.csv
.
# Load QCOM stock data from a CSV file
# Read the CSV file without parsing dates
qcom_df = pd.read_csv('../data/QCOM.csv', index_col='Date')
# Convert the 'Date' column to datetime
qcom_df.index = pd.to_datetime(qcom_df.index, format='%d/%m/%y')
This code loads stock data for QCOM (Qualcomm Incorporated) from a CSV file into a pandas DataFrame. It initially reads the CSV without parsing the 'Date'
column as dates, setting 'Date'
as the index. Then, it converts the 'Date'
index into datetime objects using a specific date format. Let’s preview the data using the qcom.head() method.
For calculating correlations, we can use the corr() function that computes pairwise correlation of columns, excluding NA/null values. High correlation between 'Open' and 'Close' prices may be expected, but other relationships could signal interesting market dynamics.
# Correlation matrix of prices
qcom_df[['Open', 'Close', 'High', 'Low']].corr()
The correlation matrix shows high correlation between all variables. It suggests that the stock experiences low intraday volatility, with its opening, high, low, and closing prices remaining closely related throughout the trading day.
Let’s load data for one more stock Microsoft for the same period.
# Load MSFT stock data from a CSV file
# Read the CSV file without parsing dates
msft_df = pd.read_csv('../data/MSFT.csv', index_col='Date',parse_dates=['Date'])
msft_df.head()
If you want to calculate the correlation of the 'Close'
prices (for instance) between the two stocks, you would do something like this:
# Assuming the indexes are already aligned and the date formats are consistent
correlation_matrix = qcom_df['Close'].corr(msft_df['Close'])
print(correlation_matrix)
The correlation is -0.33. A correlation coefficient of -0.33 between the close prices of MSFT and QCOM indicates a weak inverse or negative relationship between the two stocks' closing prices. In practical terms, this means that on some days when MSFT's stock price goes up, QCOM's stock price tends to go down, and vice versa. However, since the correlation is weak, this relationship is not strong or consistent. It suggests that other factors might be influencing the stock prices, and they do not move strongly in opposite directions relative to each other.
Note: In the Jupyter notebook (member only), we have also included an advanced example that demonstrates how to calculate the correlation matrix between the 'Close' and 'Volume' columns of both qcom_df and msft_df.
You may find these interesting
Related Downloads
Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.