Latest Posts

How to Select the Most Appropriate Time Series Model?

  • Simple Linear and Exponential Growth Models – If an analyst looks at a time series plot graph he/she may see patterns exhibiting possible linear or exponential growth relationship to the dependent variable.  Serial correlation of the error terms must not be present and the Durbin Watson test can test for this.
  • Auto-Regressive Models – If serial correlation exists in a simple time series model, the analyst can create an auto-regressive time series with the sample data, where the independent variable is a lagged (prior period) value.  The AR model is appropriate where the prior period value is the best predictor for the future period dependent variable value.  The D-W test will not work on an AR model, so the analyst needs to examine the error term correlations to check for the presence of serial correlation.  A Dickey Fuller test can test for the presence of a unit root in the AR model.
  • Seasonality –The plot of a time series model may show seasonality; the model may be improved by adding a seasonal lag variable, through the technique of first differencing.
  • Moving Average and AR Moving Average Models – This model modification may improve on a base AR model.
  • Auto Regressive Conditional Heteroskedasticity – ARCH must be tested for to ensure that the AR, MA, or ARMA model’s t-scores are not overstated.
  • Out of Sample Data Testing – If an analyst has several time series models examining a dependent variable, then the forecasted values for each can be compared with actual out of sample data values.  The model with the lowest root mean square error (RMSE) is considered to have the best predictive capabilities.

ARMA Models and ARCH Testing

  • Autoregressive Moving Average Model (ARMA) = calculates an average value over a period of time to smooth fluctuations in a time series.
  • ARMA models are very sensitive to minor changes and may rarely forecast well.
  • Auto Regressive Conditional Heteroskedasticity (ARCH) testing = can be used to determine if an AR, MA, or ARMA model suffers from conditional heteroskedasticity.
  • The ARCH test models the error terms and if its slope is statistically significant, then the predictive AR, MA, or ARMA model under scrutiny is not valid.

Auto-Regressive Models – Random Walks and Unit Roots

  • This is the case of an AR time series model where the predicted value is expected to equal the previous period plus a random error:
  • xt = b0 + xt-1 + εt
  • When b0 is not equal to zero, the model is a random walk with a drift, but the key characteristic is a b1 = 1.
  • The expected value of the error is still zero.
  • The mean reverting level for a random walk is not covariance stationary and the technique of first differencing is frequently used to transform an AR model with one time lag variable (AR1) into a model that is covariance stationary.
  • If an AR time series is covariance stationary, then the serial correlations for the lag variables are insignificant or they rapidly drop to zero as the number of time period lags rises.
  • When the lag coefficient is not statistically different from 1, a unit root exists.
  • Dickey-Fuller test = applied to AR1 model to test for a unit root.
  • If a unit root is present, then the model is not covariance stationary; if this is the case, the independent variable must be transformed, so you can re-model.

Auto-Regressive (AR) Time Series Models

  • Auto-Regressive (AR) Time Series Models
  • This type of time series model utilizes a time period lagged observation as the independent variable to predict the dependent variable, which is the value in the next time period.
xt = b0 + b1xt-1 + εt
  • There can be more than one time period lag independent variable.
  • Valid statistical inferences from AR time series models only if the time series is covariance stationary; a time series with growth over time or seasonality is not covariance stationary.
  • It is critical to test your AR time series model for serial correlation and the Durbin-Watson test cannot be used for this model.
  • An AR time series model that is covariance stationary will exhibit mean reversion – it will tend to fall after going above the mean and rise after going below the mean.
  • Root Mean Square Error (RMSE) = a method of assessing the out of sample accuracy of a time series model’s forecast.  If comparing multiple models, the model will the lowest RMSE is considered to have the best forecasting capabilities.

Time Series Analysis: Simple and Log-linear Trend Models

Simple Time Series Models

  • This is basic trend modeling.
A simple trend model can be expressed as follows:

yt = b0 + b1t+ εt

  • b0 = the y-intercept; where t = 0.
  • b1 = the slope coefficient of the time trend.
  • t = the time period.
  • ŷt = the estimated value for time t based on the model.
  • ei = the random error of the time trend.
  • The big validity pit-fall for simple trend models is serial correlation; if this problem is present, then you will see an artificially high R2 and your slope coefficient may falsely appear to be significant.
  • There is a visual way to detect serial correlation (not shown) or you can perform a Dubin-Watson test.

Log-linear Trend Models

  • This applies to non-linear time series trends.
The structure is:

  • ln yt = b0 + b1t+ et; or
  • yt = e b0 + b1t + et
  • Again, like the simple trend model, use a graph or Durbin Watson test to check for serial correlation, as this will be a big threat to validity.

Qualitative and Dummy Variables in Regression Modeling

  • Handle qualitative independent variables with a quantitative proxy or use a dummy variable.
  • When using a dummy independent variables (such as assigning a number to the degree of consumer confidence), define a collectively exhaustive set of “j” categories, then j-1 (“j minus one”) will give you the number of dummy variables for inclusion in your model.
  • Models with dummy independents can easily be misspecified.

Model types with qualitative dependent variables

  • Probit models – based on a normal distribution and attempt to estimate the probability that the dependent variable will equal 1.
  • Logit models – based on the logistic distribution and like Probit models, they attempt to estimate the probability that the dependent variable will equal 1.
  • Discriminant Analysis – creates a score and if the score crosses a threshold then the dependent variable is assigned a 1.

Looking at the big picture, you want your multiple regression model to:

  1. Have a good theoretical basis and;
  2. Pass the most stringent statistical tests (refer back to the sub-section “Assumption Violations”).

Regression Analysis and Assumption Violations

Heteroskedasticity

There are two types, Conditional and Unconditional.  The type focused on in evaluating model validity is Conditional Heteroskedasticity.

  • Conditional = the error terms change in a systematic manner that is correlated with the values of the independent variables.
    • Look up a graph depicting this problem.
  • The Breusch-Pagan test will test for Conditional Heteroskedasticity.
  • When this problem is present, the model’s t-scores will be artificially high, indicating a false significance of relationships.

Serial Correlation

This is interaction of your model’s error terms.

  • When serial (or auto) correlation is present your SEE may be incorrect.
  • The Durbin-Watson test statistic can be used to determine the presence of Serial Correlation in multiple regression models, as well as simple and log linear time series models, but not on auto-regressive time series models.

Multi-collinearity

Two or more of your independent variables are highly correlated.

  • A tiny bit of multi-collinearity is tolerable and can be common in regression models involving several independent variables.
  • A common symptom of this problem is the presence of a high coefficient of determination (R2), despite having low t-scores for your independent variables (i.e. they are insignificant).

Fcalc – the Global Test for Regression Significance

  • A statistically significant Fcalc (i.e. one that passes the Fcritical threshold, based on your degrees of freedom) can indicate that your model as a whole is meaningful.
  • This test is really applicable for multiple regressions, where there is more than one slope coefficient (b1, b2, b3 … bi), as a t-test will not work for multiple regression models.
  • The F-test is a one tailed test.
  • The null hypothesis will be that the Fcalc is less than or equal to the Fcritical and you will be looking to reject the null with an Fcalc > Fcritical.
  • A rejection of the null indicates that at least one of the slope coefficients is significant and there is some validity to the model.
  • Fcalc has a math relationship with RSS (and MSSR), SSE (and MSSE), and TSS.

Multiple Regression and Coefficient of Determination (R-Squared)

  • For a multiple regression model, this value represents the percentage of total variation in Y that is explained by the regression equation.
  • The value is between 0 and 1.
  • R-squared has a mathematical relationship with TSS, SSE, and RSS.
  • R2 = RSS/TSS = (TSS-SSE)/TSS = 1- (SSE/TSS)
  • The coefficient of determination alone does not indicate that a model is well specified, for example you could have more independent variables than necessary and the R2 will still be high – in this case your model would be not be considered parsimonious.
  • Adjusted R2 = an alternate measure and will always be smaller than R2

Multiple Regression Analysis

Much of the concepts in simple regression are applicable, but watch out when determining your degrees of freedom for different analyses, as the values will be slightly different for models similar in observation count, but different in slope coefficient count.

Six Assumptions of Multiple Regression (very similar to simple regression)

  1. Y and X must have a liner relationship.
  2. X is not random and there is no multicollinearity among two or more of the independent variables.
  3. The expected value of e is 0 (zero).
  4. No heteroskedasticity, i.e., the error term’s variance is the same for all observations /does not exhibit a relationship with the independent variable.
  5. No serial correlation, i.e. error terms are uncorrelated with one another across all observations.
  6. The error term has a normal distribution.

Be sure to review and get comfortable with the standard form of a statistical software program’s output for a multiple variable regression analysis.

  • Example multiple regression equation: Yi = b0 + b1X1i + b2X2i  + ei
  • You will need to be able to:
    • Use this equation to estimate a dependent variable (this becomes simple plug and chug once you are comfortable with the material)
    • Test the overall validity of a multiple regression model (see Fcalc below)
    • Perform tcalcs to test the validity of the y-intercept and individual slope coefficients.
    • Determine confidence intervals for individual slope coefficients.
    • Hypothesis testing to determine if a slope coefficient is statistically different from some specified value (maybe your colleague creating a similar model but deriving a different slope coefficient; this testing will determine if the difference is statistically significant).
    • Determine the Standard Error of the Estimate (SEE)