Autocorrelation in R

Autocorrelation is an important part of time series analysis. It helps us understand how each observation in a time series is related to its recent past observations. When autocorrelation is high in a time series, it becomes easy to predict their future observations.

Let us consider the Microsoft stock prices for the year 2016, which we have as a time series object msft_ts. Below we have plotted the stock prices in the left chart and a scatter plot of the stock prices with a lag of 1 on the right hand side. We can clearly see a strong positive correlation between the two.

> plot.ts(msft_ts,main="MSFT Stock Prices",ylab="Prices")
> plot(msft_ts[-252],msft_ts[-1],main="Scatterplot (lag=1)")
> abline(lm(msft_ts[-1] ~ msft_ts[-252]),col=4)
>

We can also calculate the correlation between the actual series and the lagged series using the cor() function in R.

> #Correlation of stock price today and 1 day earlier
> cor(msft_ts[-252],msft_ts[-1])
[1] 0.9797061
>

Similarly we can calculate the correlation with different lags, such as between the stock price today and the stock price two days earlier. In our example, the correlation between these two pairs is 0.9623, which is still quite high but less compared to the correlation with a time lag of 1.

> cor(msft_ts[-(251:252)],msft_ts[-(1:2)])
[1] 0.9623822
>

ACF Function

Instead of calculating the correlation with each time lag manually, we can use the acf() function in R. The function acf computes (and by default plots) estimates of the autocovariance or autocorrelation function for different time lags.

Below we get the autocorrelations for lag 1 to 10. Notice that the correlation keeps reducing as the lag increases.

> acf(msft_ts, lag.max=10, plot=FALSE)

Autocorrelations of series msft_ts, by lag:

0     1     2     3     4     5     6     7     8     9    10

1.000 0.973 0.948 0.922 0.897 0.872 0.849 0.826 0.805 0.787 0.772

We can also get the same information in an acf plot as shown below. This is also called a correlogram, also known as an autocorrelation plot.

> acf(msft_ts, lag.max=10)

The x-axis donates the time lag, while the y-axis displays the estimated autocorrelation. Looking at this data, we can say that each observation is positively related to its recent past observations. However, the correlation decreases as the lag increases.

Exercise

Provided below is a csv file that contains the daily stock prices of 5 US stocks for 251 days. The five stocks are AAPL, MSFT, GOOG, IBM, and INTC.

  1. Load the data in R in a variable called stock_data.
  2. Extract only the AAPL stock prices in another variable called aapl_prices.
  3. Convert the aapl_prices into a time series using the ts() function. Store this in a variable called aapl_prices_ts.
  4. Calculate the autocorrelation in aapl_prices_ts with 1 and 2 lags using the cor() function.
  5. Use the acf function to find the autocorrelations in the aapl_prices_ts with 1 to 10 time lags.
Downloads

Why there is difference in the values of correlation coefficients with that of values given by acf. Lag 2 gives a value 0.962 while acf shows 0.948

The two estimates differ slightly as they use slightly different scalings in their calculation of sample covariance, 1/(n-l) in case of cor() versus 1/n in case of acf(). Even though the acf() method provides a biased estimate, it is preferred in time series analysis. The autocorrelation estimates differ by a factor of (n-l)/n.

Lesson Resources

All Users

Related Downloads

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.