How to Create a Covariance Matrix in R

In this article, we will learn how to create a covariance matrix in R. As we know, covariance measures the comovement between two variables i.e. the amount by which the two random variables show movement or change together. In other words, it represents the degree to which two variables are linearly associated.

A covariance matrix becomes useful when we have many variables in a dataset, and we want to know the covariance between each of those variables. It’s a square matrix that shows covariances between different variables.

Let’s learn about how to create a Covariance Matrix in R and interpret the results.

For our example, we will create the covariance matrix for three stock indices, namely, S&P 500, Dow Jones, and NASDAQ. We will fetch the historical data for these three indices from a package called qrmdata. In the process of creating covariance matrix, we will also show some important data manipulations that we need to perform to get the right results.

Steps to Calculate Covariance Matrix

Step 1: Install and Load Packages

The first step is to install and load the two packages, qrmdata and xts

# Install and load the necessary packages
install.packages(c("qrmdata", "xts"))
library(qrmdata)
library(xts)

Step 2: Load the stock index data

We will now load the stock index data for the three indices from the qrmdata package

# Load the stock index data
data("SP500")
data("DJ")
data("NASDAQ")

Step 3: Prepare the data

The data for each index will be loaded in R as xts objects. We need to do a few things to make it ready for creating covariance matrix. The first thing we do is convert each xts object into a dataframe. Next we merge all dataframes into a single dataframe. We update the column names to be SP500, DJ, and NASDAQ. We then clean this data by omitting rows containing NA values.

# Convert to data frames and add Date as a column
SP500_df <- data.frame(Date = index(SP500), SP500 = coredata(SP500))
DJ_df <- data.frame(Date = index(DJ), DJ = coredata(DJ))
NASDAQ_df <- data.frame(Date = index(NASDAQ), NASDAQ = coredata(NASDAQ))
  
# Merge all dataframes by Date
merged_data <- merge(SP500_df, DJ_df, by = "Date", all = TRUE)
merged_data <- merge(merged_data, NASDAQ_df, by = "Date", all = TRUE)

# Rename columns
colnames(merged_data) <- c("Date", "SP500", "DJ", "NASDAQ")

# Remove rows with NA values (i.e., dates where one or more indices didn't have data)
clean_data <- na.omit(merged_data)

Step 4: Calculate daily returns

What we have in this data is the daily index values. Covariance is generally calculated on the returns (percentage changes) rather than absolute values. This is important because returns standardize the values and make them comparable. So, our next step is to calculate daily returns on each index.

# Calculate daily returns
clean_data$SP500_Returns <- c(NA, diff(log(clean_data$SP500))) * 100
clean_data$DJ_Returns <- c(NA, diff(log(clean_data$DJ))) * 100
clean_data$NASDAQ_Returns <- c(NA, diff(log(clean_data$NASDAQ))) * 100   

# Remove the first row (which will have NA values for the returns)
clean_data <- clean_data[-1, ]

# Select returns data
returns <- clean_data[, c("SP500_Returns", "DJ_Returns", "NASDAQ_Returns")]

Step 5: Calculate Covariance Matrix

The final step is to create the covariance matrix. We will use the cov() function for this.

# Calculate the covariance matrix
cov_mat <- cov(returns, use = "complete.obs")

# Print the covariance matrix
print(cov_mat)

Interpret Results

In our example, the resulting covariance matrix is as follows:

> print(cov_mat)
               SP500_Returns DJ_Returns NASDAQ_Returns
SP500_Returns       1.351521   1.273071       1.613118
DJ_Returns          1.273071   1.285778       1.412438
NASDAQ_Returns      1.613118   1.412438       2.877315

The numbers on the diagonals are the variances of each index.

  • 1.351521 is the variance of the returns for the SP500 index
  • 1.285778 is the variance of the returns for the DJ index
  • 2.877315 is the variance of the returns for the NASDAQ index

The non-diagonal numbers represent the covariances between different indexes.

  • 1.273071 is the covariance between the returns of the SP500 and DJ indices
  • 1.613118 is the covariance between the returns of the SP500 and NASDAQ indices
  • 1.412438 is the covariance between the returns of the DJ and NASDAQ indices

Remember that a positive covariance indicates that the two indices increase/decrease simultaneously, while a negative number indicates that they move inversely (when one increases, the other decreases).

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $39 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.