We will perform some data analysis with the 4 ETF symbols that we have loaded into the environment. This analysis consists of comparing the returns of the four ETFs, observing their correlations, getting some statistics, and trying to answer some questions such as which of the ETFs has the best performance in the whole period. We will also perform a breakdown analysis of the returns of each ETF by year.
Create Dataframe of Adjusted Prices
To start with, we will create a data frame that contains the adjusted prices of all the four ETFs.
adjustedPrices <- data.frame(cbind(SPY$SPY.Adjusted,IVV$IVV.Adjusted,QQQ$QQQ.Adjusted,IWF$IWF.Adjusted)) head(adjustedPrices) SPY.Adjusted IVV.Adjusted QQQ.Adjusted IWF.Adjusted 2014-02-03 156.2863 156.6725 79.55049 75.79685 2014-02-04 157.3810 157.7458 80.13563 76.49506 2014-02-05 157.1836 157.5401 79.92802 76.26232 2014-02-06 159.2564 159.6420 80.94727 77.31428 2014-02-07 161.2306 161.7259 82.39124 78.43140 2014-02-10 161.5267 161.9495 82.86315 78.64553 2014-02-11 163.2944 163.7383 83.80693 79.44613 2014-02-12 163.3752 163.8725 83.96736 79.60438 2014-02-13 164.2186 164.6685 84.59025 80.16293 2014-02-14 165.1249 165.5450 84.76012 80.33982
Calculate Daily Returns
Now we will use the dailyReturn function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages. We will also names the columns of the returns dataframe with the string symbols of each ETF and preview the data.
# Use the dailyReturn function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages. returns <- data.frame(round(dailyReturn(SPY)*100,4), round(dailyReturn(IVV)*100,4),round(dailyReturn(QQQ)*100,4), round(dailyReturn(IWF)*100,4)) # Names the columns of the returns dataframe with the string symbols of each ETF colnames(returns) <- etfs head(returns,10) SPY IVV QQQ IWF 2014-02-03 -2.1352 -2.1233 -2.1136 -2.2921 2014-02-04 0.7005 0.6850 0.7356 0.9211 2014-02-05 -0.1254 -0.1304 -0.2591 -0.3042 2014-02-06 1.3187 1.3342 1.2752 1.3794 2014-02-07 1.2396 1.3054 1.7838 1.4449 2014-02-10 0.1837 0.1383 0.5727 0.2730 2014-02-11 1.0944 1.1045 1.1390 1.0180 2014-02-12 0.0495 0.0819 0.1914 0.1992 2014-02-13 0.5163 0.4858 0.7418 0.7017 2014-02-14 0.5519 0.5323 0.2008 0.2207
Create Correlation Matrix
# Correlation matrix of the ETFs returns. cor(returns) SPY IVV QQQ IWF SPY 1.0000000 0.9937751 0.9240429 0.9699816 IVV 0.9937751 1.0000000 0.9214337 0.9724484 QQQ 0.9240429 0.9214337 1.0000000 0.9687469 IWF 0.9699816 0.9724484 0.9687469 1.0000000
The correlation matrix of the returns shows that all ETFs are highly correlated with each other during the period 2014-2019. The greatest correlation is between the IVV and SPY, and the lowest correlation is between IVV and QQQ.
Note: Since we’re directly executing the cor() function, it will just create and display the correlation matrix. It will not save the object in the Environment. If you want to load/save it in the environment as an object use something like corrMatrix <- cor(returns). This will save the results in a variable called corrMatrix.
Calculate Various Statistics for ETFs
It will be interesting to observe some statistics of the returns of each ETF in the whole period. In the following code, we are calculating statistics such as mean, standard deviation, median, min, max of ETF returns.
ETFStats <- do.call(data.frame, list(mean = round(apply(returns, 2, mean),4), sd = round(apply(returns, 2, sd),4), median = round(apply(returns, 2, median),4), min = round(apply(returns, 2, min),4), max = round(apply(returns, 2, max),4))) t(ETFStats) SPY IVV QQQ IWF mean 0.0372 0.0372 0.0591 0.0482 sd 0.8327 0.8356 1.0596 0.9033 median 0.0499 0.0512 0.1006 0.0830 min -4.2107 -4.1698 -4.5767 -4.0930 max 5.0525 4.9049 6.2439 5.6418
The QQQ ETF has the highest volatility in the period with a standard deviation of the returns of 1.06 between 2014-2019. Also QQQ has the highest value in the mean returns among the group. The maximum daily return was for QQQ too with 6.24%.
- This function allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function.
- The apply function is useful for applying a function across either the rows or columns of a data matrix, that is, you may apply a function across each row or across each column. For example, in the above code, we have apply(returns, 2, mean). The first argument is the returns data, which contains returns for each ETF for a period of time. The rows represent returns for each date. Each column is for one ETF. The second argument specified the dimension over which to apply the function (1 for rows and 2 for columns). The third argument is the function we want to apply. In this case, we are applying the mean function over each column in the returns data matrix. Similarly, the apply function is used to calculate other statistics.
- The t() function transposes a matrix or data.frame.
Asking Exploratory Questions
We can also use R to find answers to exploratory questions while analyzing the data. For example, find the date of the highest returns for QQQ. With a good understanding of R functions, we can answer these kind of questions as shown below:
# Find the date of the highest return for QQQ. The ‘which’ command will return the index in the returns dataframe with the highest value of the QQQ returns. max_ret <- which(returns$QQQ == max(returns$QQQ)) # Once you have the index, you can retrieve the actual row from the dataframe. returns[max_ret,] SPY IVV QQQ IWF 2018-12-26 5.0525 4.9049 6.2439 5.6418
The highest return value for QQQ was on 2018-12-26. It seems that this date has the highest returns value for the four ETF.