The above chart exhibits a clear upward trend in the stock prices. This means that we will be differencing to make the series stationary.
Sometimes a visual inspection may not clearly identify such pattern. In such cases, we can use the Augmented Dickey-Fuller unit root test to check whether the series is stationary or not. The p-value resulting from the ADF test must be less than 0.05 or 5% for a time series to be stationary. If the p-value is greater than 0.05 or 5%, you conclude that the time series has a unit root which means that it is a non-stationary process.
In R, we can do this using adf.test() function available in the tseries package. The following code loads the required package and performs the test.
1> install.packages("tseries")2> library(tseries)3> adf.test(fb_ts)4 Augmented Dickey-Fuller Test
5data: fb_ts
6Dickey-Fuller =-3.1368, Lag order =9, p-value =0.098837alternative hypothesis: stationary
8>9
The test results confirm our observation that series is non-stationary (p-value >0.05) and will need differencing to make it stationary.
Step 3: Identify the Model
The next step is to identify the model, i.e., the appropriate order of Autoregressive (AR) and Moving Average (MA) processes p, and q. We will do so using the Autocorrelation function (ACF) and Partial Autocorrelation function (PACF).
Let's create the ACF and PACF plots.
1> acf(fb_ts)2> pacf(fb_ts)3
Recall our analysis of these two functions. The ACF plot shows slow decay of lag to 0 indicating an AR model. The PACF plot suggests AR model of the order 1 AR(1) as PACF number is close to 0 after lag 1.
Identifying the best fit model is a complex process and we may want to test multiple models to check what best fits our data. Both experience and knowledge of advanced topics can be helpful. However, based on our limited analysis, let's say we will go with p=1, d=1, and q=0.
Our suggested model is then ARIMA(1,1,0).
Step 4: Estimate the Model
We can now estimate the model for our data as shown below:
Notice that all predicted values for the next 20 periods look the same. This is because the forecast will be flat with no drift. There is a possibility that our model has a drift also along with a trend.
For simplicity sake, let's also extract the two series in their own respective variables.