Data Analysis with Quantmod in R

Premium

We will perform some data analysis with the 4 ETF symbols that we have loaded into the environment. This analysis consists of comparing the returns of the four ETFs, observing their correlations, getting some statistics, and trying to answer some questions such as which of the ETFs has the best performance in the whole period. We will also perform a breakdown analysis of the returns of each ETF by year.

We will use R's data manipulation capabilities along with packages like quantmod to perform our analysis. The four ETFs we're examining represent different market segments: SPY and IVV both track the S&P 500 index, QQQ tracks the Nasdaq-100 index focusing on large-cap growth stocks in the technology sector, and IWF tracks the Russell 1000 Growth Index. By analyzing these ETFs together, we can get insights into how different market segments perform and correlate with each other over time.

Create Dataframe of Adjusted Prices

To start with, we will create a data frame that contains the adjusted prices of all the four ETFs.

1adjustedPrices <- data.frame(cbind(SPY$SPY.Adjusted,IVV$IVV.Adjusted,QQQ$QQQ.Adjusted,IWF$IWF.Adjusted))
2 
3head(adjustedPrices)
4
5SPY.Adjusted IVV.Adjusted QQQ.Adjusted IWF.Adjusted
62014-02-03     156.2863     156.6725     79.55049     75.79685
72014-02-04     157.3810     157.7458     80.13563     76.49506
82014-02-05     157.1836     157.5401     79.92802     76.26232
92014-02-06     159.2564     159.6420     80.94727     77.31428
102014-02-07     161.2306     161.7259     82.39124     78.43140
112014-02-10     161.5267     161.9495     82.86315     78.64553
122014-02-11     163.2944     163.7383     83.80693     79.44613
132014-02-12     163.3752     163.8725     83.96736     79.60438
142014-02-13     164.2186     164.6685     84.59025     80.16293
152014-02-14     165.1249     165.5450     84.76012     80.33982
16

Calculate Daily Returns

Now we will use the dailyReturn  function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages. We will also names the columns of the returns dataframe with the string symbols of each ETF and preview the data.

1# Use the dailyReturn  function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages.
2
3returns <- data.frame(round(dailyReturn(SPY)*100,4),
4                      round(dailyReturn(IVV)*100,4),round(dailyReturn(QQQ)*100,4),
5                      round(dailyReturn(IWF)*100,4))
6                      
7# Names the columns of the returns dataframe with the string symbols of each ETF
8 
9colnames(returns) <- etfs
10 
11head(returns,10)
12               SPY     IVV     QQQ     IWF
132014-02-03 -2.1352 -2.1233 -2.1136 -2.2921
142014-02-04  0.7005  0.6850  0.7356  0.9211
152014-02-05 -0.1254 -0.1304 -0.2591 -0.3042
162014-02-06  1.3187  1.3342  1.2752  1.3794
172014-02-07  1.2396  1.3054  1.7838  1.4449
182014-02-10  0.1837  0.1383  0.5727  0.2730
192014-02-11  1.0944  1.1045  1.1390  1.0180
202014-02-12  0.0495  0.0819  0.1914  0.1992
212014-02-13  0.5163  0.4858  0.7418  0.7017
222014-02-14  0.5519  0.5323  0.2008  0.2207
23

Create Correlation Matrix

1# Correlation matrix of the ETFs returns.
2 
3cor(returns)
4 
5          SPY       IVV       QQQ       IWF
6SPY 1.0000000 0.9937751 0.9240429 0.9699816
7IVV 0.9937751 1.0000000 0.9214337 0.9724484
8QQQ 0.9240429 0.9214337 1.0000000 0.9687469
9IWF 0.9699816 0.9724484 0.9687469 1.0000000
10

The correlation matrix of the returns shows that all ETFs are highly correlated with each other during the period 2014-2019. The greatest correlation is between the IVV and SPY, and the lowest correlation is between IVV and QQQ. 

Note: Since we’re directly executing the cor() function, it will just create and display the correlation matrix. It will not save the object in the Environment. If you want to load/save it in the environment as an object use something like corrMatrix <- cor(returns). This will save the results in a variable called corrMatrix.

Calculate Various Statistics for ETFs

It will be interesting to observe some statistics of the returns of each ETF in the whole period. In the following code, we are calculating statistics such as mean, standard deviation, median, min, max of ETF returns.

1ETFStats <- do.call(data.frame, 
2                    list(mean = round(apply(returns, 2, mean),4),
3                         sd = round(apply(returns, 2, sd),4),
4                         median = round(apply(returns, 2, median),4),
5                         min = round(apply(returns, 2, min),4),
6                         max = round(apply(returns, 2, max),4)))
7 
8t(ETFStats)
9 
10           SPY     IVV     QQQ     IWF
11mean    0.0372  0.0372  0.0591  0.0482
12sd      0.8327  0.8356  1.0596  0.9033
13median  0.0499  0.0512  0.1006  0.0830
14min    -4.2107 -4.1698 -4.5767 -4.0930
15max     5.0525  4.9049  6.2439  5.6418
16

The QQQ ETF has the highest volatility in the period with a standard deviation of the returns of 1.06 between 2014-2019. Also QQQ has the highest value in the mean returns among the group. The maximum daily return was for QQQ too with 6.24%. 

Notes:

  • This function allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function. 
  • The apply function is useful for applying a function across either the rows or columns of a data matrix, that is, you may apply a function across each row or across each column. For example, in the above code, we have apply(returns, 2, mean). The first argument is the returns data, which contains returns for each ETF for a period of time. The rows represent returns for each date. Each column is for one ETF. The second argument specified the dimension over which to apply the function (1 for rows and 2 for columns). The third argument is the function we want to apply. In this case, we are applying the mean function over each column in the returns data matrix. Similarly, the apply function is used to calculate other statistics.
  • The t() function transposes a matrix or data.frame.

Asking Exploratory Questions

We can also use R to find answers to exploratory questions while analyzing the data. For example, find the date of the highest returns for QQQ. With a good understanding of R functions, we can answer these kind of questions as shown below:

1# Find the date of the highest return for QQQ. The ‘which’ command will return the index in the returns dataframe with the highest value of the QQQ returns.
2 
3max_ret <- which(returns$QQQ == max(returns$QQQ))
4 
5# Once you have the index, you can retrieve the actual row from the dataframe.
6  
7returns[max_ret,]
8 
9              SPY    IVV    QQQ    IWF
102018-12-26 5.0525 4.9049 6.2439 5.6418
11

The highest return value for QQQ was on 2018-12-26. It seems that this date has the highest returns value for the four ETF.