We will perform some data analysis with the 4 ETF symbols that we have loaded into the environment. This analysis consists of comparing the returns of the four ETFs, observing their correlations, getting some statistics, and trying to answer some questions such as which of the ETFs has the best performance in the whole period. We will also perform a breakdown analysis of the returns of each ETF by year.

### Create Dataframe of Adjusted Prices

To start with, we will create a data frame that contains the adjusted prices of all the four ETFs.

```
adjustedPrices <- data.frame(cbind(SPY$SPY.Adjusted,IVV$IVV.Adjusted,QQQ$QQQ.Adjusted,IWF$IWF.Adjusted))
head(adjustedPrices)
SPY.Adjusted IVV.Adjusted QQQ.Adjusted IWF.Adjusted
2014-02-03 156.2863 156.6725 79.55049 75.79685
2014-02-04 157.3810 157.7458 80.13563 76.49506
2014-02-05 157.1836 157.5401 79.92802 76.26232
2014-02-06 159.2564 159.6420 80.94727 77.31428
2014-02-07 161.2306 161.7259 82.39124 78.43140
2014-02-10 161.5267 161.9495 82.86315 78.64553
2014-02-11 163.2944 163.7383 83.80693 79.44613
2014-02-12 163.3752 163.8725 83.96736 79.60438
2014-02-13 164.2186 164.6685 84.59025 80.16293
2014-02-14 165.1249 165.5450 84.76012 80.33982
```

### Calculate Daily Returns

Now we will use the *dailyReturn* function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages. We will also names the columns of the returns dataframe with the string symbols of each ETF and preview the data.

```
# Use the dailyReturn function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages.
returns <- data.frame(round(dailyReturn(SPY)*100,4),
round(dailyReturn(IVV)*100,4),round(dailyReturn(QQQ)*100,4),
round(dailyReturn(IWF)*100,4))
# Names the columns of the returns dataframe with the string symbols of each ETF
colnames(returns) <- etfs
head(returns,10)
SPY IVV QQQ IWF
2014-02-03 -2.1352 -2.1233 -2.1136 -2.2921
2014-02-04 0.7005 0.6850 0.7356 0.9211
2014-02-05 -0.1254 -0.1304 -0.2591 -0.3042
2014-02-06 1.3187 1.3342 1.2752 1.3794
2014-02-07 1.2396 1.3054 1.7838 1.4449
2014-02-10 0.1837 0.1383 0.5727 0.2730
2014-02-11 1.0944 1.1045 1.1390 1.0180
2014-02-12 0.0495 0.0819 0.1914 0.1992
2014-02-13 0.5163 0.4858 0.7418 0.7017
2014-02-14 0.5519 0.5323 0.2008 0.2207
```

### Create Correlation Matrix

```
# Correlation matrix of the ETFs returns.
cor(returns)
SPY IVV QQQ IWF
SPY 1.0000000 0.9937751 0.9240429 0.9699816
IVV 0.9937751 1.0000000 0.9214337 0.9724484
QQQ 0.9240429 0.9214337 1.0000000 0.9687469
IWF 0.9699816 0.9724484 0.9687469 1.0000000
```

The correlation matrix of the returns shows that all ETFs are highly correlated with each other during the period 2014-2019. The greatest correlation is between the IVV and SPY, and the lowest correlation is between IVV and QQQ.

Note: Since we’re directly executing the *cor()* function, it will just create and display the correlation matrix. It will not save the object in the Environment. If you want to load/save it in the environment as an object use something like ** corrMatrix <- cor(returns)**. This will save the results in a variable called

*corrMatrix*.

### Calculate Various Statistics for ETFs

It will be interesting to observe some statistics of the returns of each ETF in the whole period. In the following code, we are calculating statistics such as mean, standard deviation, median, min, max of ETF returns.

```
ETFStats <- do.call(data.frame,
list(mean = round(apply(returns, 2, mean),4),
sd = round(apply(returns, 2, sd),4),
median = round(apply(returns, 2, median),4),
min = round(apply(returns, 2, min),4),
max = round(apply(returns, 2, max),4)))
t(ETFStats)
SPY IVV QQQ IWF
mean 0.0372 0.0372 0.0591 0.0482
sd 0.8327 0.8356 1.0596 0.9033
median 0.0499 0.0512 0.1006 0.0830
min -4.2107 -4.1698 -4.5767 -4.0930
max 5.0525 4.9049 6.2439 5.6418
```

The QQQ ETF has the highest volatility in the period with a standard deviation of the returns of 1.06 between 2014-2019. Also QQQ has the highest value in the mean returns among the group. The maximum daily return was for QQQ too with 6.24%.

**Notes:**

- This function allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function.
- The apply function is useful for applying a function across either the rows or columns of a data matrix, that is, you may apply a function across each row or across each column. For example, in the above code, we have
*apply(returns, 2, mean).*The first argument is the returns data, which contains returns for each ETF for a period of time. The rows represent returns for each date. Each column is for one ETF. The second argument specified the dimension over which to apply the function (1 for rows and 2 for columns). The third argument is the function we want to apply. In this case, we are applying the mean function over each column in the returns data matrix. Similarly, the apply function is used to calculate other statistics. - The
*t()*function transposes a matrix or data.frame.

### Asking Exploratory Questions

We can also use R to find answers to exploratory questions while analyzing the data. For example, find the date of the highest returns for QQQ. With a good understanding of R functions, we can answer these kind of questions as shown below:

```
# Find the date of the highest return for QQQ. The ‘which’ command will return the index in the returns dataframe with the highest value of the QQQ returns.
max_ret <- which(returns$QQQ == max(returns$QQQ))
# Once you have the index, you can retrieve the actual row from the dataframe.
returns[max_ret,]
SPY IVV QQQ IWF
2018-12-26 5.0525 4.9049 6.2439 5.6418
```

The highest return value for QQQ was on 2018-12-26. It seems that this date has the highest returns value for the four ETF.

## Leave a Reply