Measuring Overall ETFs Performance

We will now plot a graph to show the accumulated returns of the ETFs over a period of time.  We can do so by following the following steps:

  • Build a dataframe with the 4 ETFs prices and a date column. Calculate daily returns and cumulative returns. The cumsum() function returns the cumulative sums (i.e. the sum of all values up to a certain position of a vector).
  • Change the format of the data from wide shape to long shape with the gather() function from dplyr package.
  • Use ggplot2 to graph the ETFs performance in the whole period.

Step 1: Build a dataframe with the 4 ETFs prices and a date column. Calculate daily returns and cumulative returns.

# Make the cumRets dataframe with the cumulative returns for each of the ETFs using cumsum() function  
# The dailyReturn is a built in function from quantmod to get the daily returns.
 
cumRets <- data.frame(date = index(SPY),
                cumsum(dailyReturn(SPY) * 100),
                 cumsum(dailyReturn(IVV) * 100),
                 cumsum(dailyReturn(QQQ)* 100),
                 cumsum(dailyReturn(IWF))* 100)
 
# Add new names to the columns of cumRets dataframe
 
colnames(cumRets)[-1] <- etfs
 
head(cumRets,10)
 
         date     SPY     IVV     QQQ     IWF
1  2014-02-03 -2.1352 -2.1233 -2.1136 -2.2921
2  2014-02-04 -1.4347 -1.4382 -1.3780 -1.3709
3  2014-02-05 -1.5602 -1.5686 -1.6371 -1.6752
4  2014-02-06 -0.2414 -0.2345 -0.3619 -0.2958
5  2014-02-07  0.9981  1.0709  1.4220  1.1491
6  2014-02-10  1.1818  1.2092  1.9947  1.4221
7  2014-02-11  2.2762  2.3137  3.1337  2.4401
8  2014-02-12  2.3256  2.3957  3.3251  2.6393
9  2014-02-13  2.8419  2.8814  4.0669  3.3410
10 2014-02-14  3.3938  3.4137  4.2677  3.5616

Step 2: Change the format of the data from wide shape to long shape with the gather function from dplyr package.

The data is currently wide-shaped because each date’s data is wide. For better analysis, We want the data to be long, where each date of data is in a separate observation.

# Make a tidy dataset called longCumRets which group the returns of each ETF in the same column. 
# The second and third parameter of the gather function are the new columns names in the tidy dataset.

# Note: You need to install the tidyr package to use the gather function. Use install.packages('tidyr') for installing and library(tidyr) to load it.
 
longCumRets <- gather(cumRets,symbol,cumReturns,etfs)
 
head(longCumRets,10)
 
         date symbol cumReturns
1  2014-02-03    SPY    -2.1352
2  2014-02-04    SPY    -1.4347
3  2014-02-05    SPY    -1.5602
4  2014-02-06    SPY    -0.2414
5  2014-02-07    SPY     0.9981
6  2014-02-10    SPY     1.1818
7  2014-02-11    SPY     2.2762
8  2014-02-12    SPY     2.3256
9  2014-02-13    SPY     2.8419
10 2014-02-14    SPY     3.3938

Step 3: Use ggplot2 to graph the ETFs performance in the whole period.

# Plot the performance of the ETF's indexes
 
ggplot(longCumRets, aes(x=date,y=cumReturns,color = symbol))  + geom_line()+ ggtitle("ETF’s Accumulated Returns")

As we can observe in the graph, all ETFs are correlated, which means that they are affected by similar drivers. QQQ has the greatest performance in the whole period while SPY and IVV have the lowest performance. At the end of 2018 a big drawdown has affected all ETFs performance.

A note about reshaping data:

There are times when our data is considered unstacked and a common attribute of concern is spread out across columns. To reformat the data such that these common attributes are gathered together as a single variable, the gather() function will take multiple columns and collapse them into key-value pairs, duplicating all other columns as needed.

Measuring Overall ETFs Performance - Yearly Breakdown

To go deeper in analyzing ETFs performance during the period, we will make a breakdown about ETF performance by year. With this picture we can gain an insight into which ETF has the greater and poor performance by year. This can be achieved by following the following steps:

  • Step 1: Transform the returns dataframe from wide to long creating the longrets dataframe. With this shape, we would have two new columns that are symbol and returns, which store the daily returns for each symbol in long format.
  • Step 2: Make a new column in longrets with only the year from the index of returns.
  • Step 3: Calculate the mean of returns for each group, where the group is composed of a particular symbol-year.
# Transform the returns dataframe from wide to long creating the longrets dataframe.
 
longrets <- gather(returns,symbol,returns,etfs)
 
head(longrets)
 
  symbol returns
1    SPY -2.1352
2    SPY  0.7005
3    SPY -0.1254
4    SPY  1.3187
5    SPY  1.2396
6    SPY  0.1837

 
# Take only the years from the index names of returns that have the dates
 
longrets$year <- substr(rownames(returns),1,4)
 
head(longrets)
 
  symbol returns year
1    SPY -2.1352 2014
2    SPY  0.7005 2014
3    SPY -0.1254 2014
4    SPY  1.3187 2014
5    SPY  1.2396 2014
6    SPY  0.1837 2014
 
# The aggregate function can split a data frame columns by groups and then
# apply a function to these groups. In our case the column that we are 
# interested to split is the returns column and the groups are symbol and year.
# Finally we will apply the mean function to each of these groups.
 
groupedReturns <- aggregate(longrets$returns, list(symbol=longrets$symbol,year=longrets$year), mean)
 
groupedReturns
 
  symbol year            x
1     IVV 2014  0.065175325
2     IWF 2014  0.062363203
3     QQQ 2014  0.082309524
4     SPY 2014  0.064810823
5     IVV 2015  0.001005556
6     IWF 2015  0.020585714
7     QQQ 2015  0.038149603
8     SPY 2015  0.001565079
9     IVV 2016  0.040590079
10    IWF 2016  0.024551190
11    QQQ 2016  0.028030159
12    SPY 2016  0.039951587
13    IVV 2017  0.071888048
14    IWF 2017  0.100740637
15    QQQ 2017  0.111167729
16    SPY 2017  0.071519124
17    IVV 2018 -0.020522311
18    IWF 2018 -0.003539841
19    QQQ 2018  0.006568924
20    SPY 2018 -0.020306375
21    IVV 2019  0.088256376
22    IWF 2019  0.111604027
23    QQQ 2019  0.112140940
24    SPY 2019  0.088615436
 

The groupedReturns dataframe provides useful insights about the mean returns of each ETF by year. We can see that the best symbol-year was QQQ 2019 with mean returns of 11%, while the poor symbol-year performance was IVV 2018 with a mean return of -2%. It is very appealing to show these insights graphically. 

annReturnBars <- ggplot(groupedReturns,aes(x=symbol,y=x)) +
  geom_bar(stat="identity",aes(fill=symbol))+
  facet_wrap(~year,ncol=2) +
  theme(legend.position = "right") + ggtitle("ETF mean returns by Year")

annReturnBars

In this bar plot chart we can observe that the QQQ ETF has the highest performance in all years, except for 2016 where the most performant was IVV ETF. Also we can observe that 2018 was the worst year of the period, where only QQQ has slightly positive returns. On the other hand, 2017 were a good year for all ETFs, and up to this moment 2019 is a good year too.

In the next section, we would start with the quantstrat package that provides an infrastructure to build, backtest and analyze trading strategies.

Related Downloads

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.