# Data Analysis with Quantmod in R

We will perform some data analysis with the 4 ETF symbols that we have loaded into the environment. This analysis consists of comparing the returns of the four ETFs, observing their correlations, getting some statistics, and trying to answer some questions such as which of the ETFs has the best performance in the whole period. We will also perform a breakdown analysis of the returns of each ETF by year.

### Create Dataframe of Adjusted Prices

To start with, we will create a data frame that contains the adjusted prices of all the four ETFs.

adjustedPrices <- data.frame(cbind(SPY$SPY.Adjusted,IVV$IVV.Adjusted,QQQ$QQQ.Adjusted,IWF$IWF.Adjusted))

2014-02-03     156.2863     156.6725     79.55049     75.79685
2014-02-04     157.3810     157.7458     80.13563     76.49506
2014-02-05     157.1836     157.5401     79.92802     76.26232
2014-02-06     159.2564     159.6420     80.94727     77.31428
2014-02-07     161.2306     161.7259     82.39124     78.43140
2014-02-10     161.5267     161.9495     82.86315     78.64553
2014-02-11     163.2944     163.7383     83.80693     79.44613
2014-02-12     163.3752     163.8725     83.96736     79.60438
2014-02-13     164.2186     164.6685     84.59025     80.16293
2014-02-14     165.1249     165.5450     84.76012     80.33982


### Calculate Daily Returns

Now we will use the dailyReturn  function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages. We will also names the columns of the returns dataframe with the string symbols of each ETF and preview the data.

# Use the dailyReturn  function from Quantmod to calculate daily returns of each ETF. The results are expressed in percentages.

returns <- data.frame(round(dailyReturn(SPY)*100,4),
round(dailyReturn(IVV)*100,4),round(dailyReturn(QQQ)*100,4),
round(dailyReturn(IWF)*100,4))

# Names the columns of the returns dataframe with the string symbols of each ETF

colnames(returns) <- etfs

SPY     IVV     QQQ     IWF
2014-02-03 -2.1352 -2.1233 -2.1136 -2.2921
2014-02-04  0.7005  0.6850  0.7356  0.9211
2014-02-05 -0.1254 -0.1304 -0.2591 -0.3042
2014-02-06  1.3187  1.3342  1.2752  1.3794
2014-02-07  1.2396  1.3054  1.7838  1.4449
2014-02-10  0.1837  0.1383  0.5727  0.2730
2014-02-11  1.0944  1.1045  1.1390  1.0180
2014-02-12  0.0495  0.0819  0.1914  0.1992
2014-02-13  0.5163  0.4858  0.7418  0.7017
2014-02-14  0.5519  0.5323  0.2008  0.2207


### Create Correlation Matrix

# Correlation matrix of the ETFs returns.

cor(returns)

SPY       IVV       QQQ       IWF
SPY 1.0000000 0.9937751 0.9240429 0.9699816
IVV 0.9937751 1.0000000 0.9214337 0.9724484
QQQ 0.9240429 0.9214337 1.0000000 0.9687469
IWF 0.9699816 0.9724484 0.9687469 1.0000000


The correlation matrix of the returns shows that all ETFs are highly correlated with each other during the period 2014-2019. The greatest correlation is between the IVV and SPY, and the lowest correlation is between IVV and QQQ.

Note: Since we’re directly executing the cor() function, it will just create and display the correlation matrix. It will not save the object in the Environment. If you want to load/save it in the environment as an object use something like corrMatrix <- cor(returns). This will save the results in a variable called corrMatrix.

### Calculate Various Statistics for ETFs

It will be interesting to observe some statistics of the returns of each ETF in the whole period. In the following code, we are calculating statistics such as mean, standard deviation, median, min, max of ETF returns.

ETFStats <- do.call(data.frame,
list(mean = round(apply(returns, 2, mean),4),
sd = round(apply(returns, 2, sd),4),
median = round(apply(returns, 2, median),4),
min = round(apply(returns, 2, min),4),
max = round(apply(returns, 2, max),4)))

t(ETFStats)

SPY     IVV     QQQ     IWF
mean    0.0372  0.0372  0.0591  0.0482
sd      0.8327  0.8356  1.0596  0.9033
median  0.0499  0.0512  0.1006  0.0830
min    -4.2107 -4.1698 -4.5767 -4.0930
max     5.0525  4.9049  6.2439  5.6418


The QQQ ETF has the highest volatility in the period with a standard deviation of the returns of 1.06 between 2014-2019. Also QQQ has the highest value in the mean returns among the group. The maximum daily return was for QQQ too with 6.24%.

Notes:

• This function allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function.
• The apply function is useful for applying a function across either the rows or columns of a data matrix, that is, you may apply a function across each row or across each column. For example, in the above code, we have apply(returns, 2, mean). The first argument is the returns data, which contains returns for each ETF for a period of time. The rows represent returns for each date. Each column is for one ETF. The second argument specified the dimension over which to apply the function (1 for rows and 2 for columns). The third argument is the function we want to apply. In this case, we are applying the mean function over each column in the returns data matrix. Similarly, the apply function is used to calculate other statistics.
• The t() function transposes a matrix or data.frame.

We can also use R to find answers to exploratory questions while analyzing the data. For example, find the date of the highest returns for QQQ. With a good understanding of R functions, we can answer these kind of questions as shown below:

# Find the date of the highest return for QQQ. The ‘which’ command will return the index in the returns dataframe with the highest value of the QQQ returns.

max_ret <- which(returns$QQQ == max(returns$QQQ))

# Once you have the index, you can retrieve the actual row from the dataframe.

returns[max_ret,]

SPY    IVV    QQQ    IWF
2018-12-26 5.0525 4.9049 6.2439 5.6418


The highest return value for QQQ was on 2018-12-26. It seems that this date has the highest returns value for the four ETF.