Data Analysis with Quantmod in R
We will perform some data analysis with the 4 ETF symbols that we have loaded into the environment. This analysis consists of comparing the returns of the four ETFs, observing their correlations, getting some statistics, and trying to answer some questions such as which of the ETFs has the best performance in the whole period. We will also perform a breakdown analysis of the returns of each ETF by year.
We will use R's data manipulation capabilities along with packages like quantmod to perform our analysis. The four ETFs we're examining represent different market segments: SPY and IVV both track the S&P 500 index, QQQ tracks the Nasdaq-100 index focusing on large-cap growth stocks in the technology sector, and IWF tracks the Russell 1000 Growth Index. By analyzing these ETFs together, we can get insights into how different market segments perform and correlate with each other over time.
Create Dataframe of Adjusted Prices
To start with, we will create a data frame that contains the adjusted prices of all the four ETFs.
1adjustedPrices <- data.frame(cbind(SPY$SPY.Adjusted,IVV$IVV.Adjusted,QQQ$QQQ.Adjusted,IWF$IWF.Adjusted))
2
3head(adjustedPrices)
4
5SPY.Adjusted IVV.Adjusted QQQ.Adjusted IWF.Adjusted
62014-02-03 156.2863 156.6725 79.55049 75.79685
72014-02-04 157.3810 157.7458 80.13563 76.49506
82014-02-05 157.1836 157.5401 79.92802 76.26232
92014-02-06 159.2564 159.6420 80.94727 77.31428
102014-02-07 161.2306 161.7259 82.39124 78.43140
112014-02-10 161.5267 161.9495 82.86315 78.64553
122014-02-11 163.2944 163.7383 83.80693 79.44613
132014-02-12 163.3752 163.8725 83.96736 79.60438
142014-02-13 164.2186 164.6685 84.59025 80.16293
152014-02-14 165.1249 165.5450 84.76012 80.33982
16
Calculate Daily Returns
Now we will use the dailyReturn function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages. We will also names the columns of the returns dataframe with the string symbols of each ETF and preview the data.
1# Use the dailyReturn function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages.
2
3returns <- data.frame(round(dailyReturn(SPY)*100,4),
4 round(dailyReturn(IVV)*100,4),round(dailyReturn(QQQ)*100,4),
5 round(dailyReturn(IWF)*100,4))
6
7# Names the columns of the returns dataframe with the string symbols of each ETF
8
9colnames(returns) <- etfs
10
11head(returns,10)
12 SPY IVV QQQ IWF
132014-02-03 -2.1352 -2.1233 -2.1136 -2.2921
142014-02-04 0.7005 0.6850 0.7356 0.9211
152014-02-05 -0.1254 -0.1304 -0.2591 -0.3042
162014-02-06 1.3187 1.3342 1.2752 1.3794
172014-02-07 1.2396 1.3054 1.7838 1.4449
182014-02-10 0.1837 0.1383 0.5727 0.2730
192014-02-11 1.0944 1.1045 1.1390 1.0180
202014-02-12 0.0495 0.0819 0.1914 0.1992
212014-02-13 0.5163 0.4858 0.7418 0.7017
222014-02-14 0.5519 0.5323 0.2008 0.2207
23
Create Correlation Matrix
1# Correlation matrix of the ETFs returns.
2
3cor(returns)
4
5 SPY IVV QQQ IWF
6SPY 1.0000000 0.9937751 0.9240429 0.9699816
7IVV 0.9937751 1.0000000 0.9214337 0.9724484
8QQQ 0.9240429 0.9214337 1.0000000 0.9687469
9IWF 0.9699816 0.9724484 0.9687469 1.0000000
10
The correlation matrix of the returns shows that all ETFs are highly correlated with each other during the period 2014-2019. The greatest correlation is between the IVV and SPY, and the lowest correlation is between IVV and QQQ.
Note: Since we’re directly executing the cor() function, it will just create and display the correlation matrix. It will not save the object in the Environment. If you want to load/save it in the environment as an object use something like corrMatrix <- cor(returns)
. This will save the results in a variable called corrMatrix.
Calculate Various Statistics for ETFs
It will be interesting to observe some statistics of the returns of each ETF in the whole period. In the following code, we are calculating statistics such as mean, standard deviation, median, min, max of ETF returns.
1ETFStats <- do.call(data.frame,
2 list(mean = round(apply(returns, 2, mean),4),
3 sd = round(apply(returns, 2, sd),4),
4 median = round(apply(returns, 2, median),4),
5 min = round(apply(returns, 2, min),4),
6 max = round(apply(returns, 2, max),4)))
7
8t(ETFStats)
9
10 SPY IVV QQQ IWF
11mean 0.0372 0.0372 0.0591 0.0482
12sd 0.8327 0.8356 1.0596 0.9033
13median 0.0499 0.0512 0.1006 0.0830
14min -4.2107 -4.1698 -4.5767 -4.0930
15max 5.0525 4.9049 6.2439 5.6418
16
The QQQ ETF has the highest volatility in the period with a standard deviation of the returns of 1.06 between 2014-2019. Also QQQ has the highest value in the mean returns among the group. The maximum daily return was for QQQ too with 6.24%.
Notes:
- This function allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function.
- The apply function is useful for applying a function across either the rows or columns of a data matrix, that is, you may apply a function across each row or across each column. For example, in the above code, we have apply(returns, 2, mean). The first argument is the returns data, which contains returns for each ETF for a period of time. The rows represent returns for each date. Each column is for one ETF. The second argument specified the dimension over which to apply the function (1 for rows and 2 for columns). The third argument is the function we want to apply. In this case, we are applying the mean function over each column in the returns data matrix. Similarly, the apply function is used to calculate other statistics.
- The t() function transposes a matrix or data.frame.
Asking Exploratory Questions
We can also use R to find answers to exploratory questions while analyzing the data. For example, find the date of the highest returns for QQQ. With a good understanding of R functions, we can answer these kind of questions as shown below:
1# Find the date of the highest return for QQQ. The ‘which’ command will return the index in the returns dataframe with the highest value of the QQQ returns.
2
3max_ret <- which(returns$QQQ == max(returns$QQQ))
4
5# Once you have the index, you can retrieve the actual row from the dataframe.
6
7returns[max_ret,]
8
9 SPY IVV QQQ IWF
102018-12-26 5.0525 4.9049 6.2439 5.6418
11
The highest return value for QQQ was on 2018-12-26. It seems that this date has the highest returns value for the four ETF.
Unlock Premium Content
Upgrade your account to access the full article, downloads, and exercises.
You'll get access to:
- Access complete tutorials and examples
- Download source code and resources
- Follow along with practical exercises
- Get in-depth explanations