Regression Analysis and Assumption Violations

Heteroskedasticity

There are two types, Conditional and Unconditional.  The type focused on in evaluating model validity is Conditional Heteroskedasticity.

  • Conditional = the error terms change in a systematic manner that is correlated with the values of the independent variables.

  • Look up a graph depicting this problem.

  • The Breusch-Pagan test will test for Conditional Heteroskedasticity.

  • When this problem is present, the model’s t-scores will be artificially high, indicating a false significance of relationships.

Serial Correlation

This is interaction of your model’s error terms.

  • When serial (or auto) correlation is present your SEE may be incorrect.
  • The Durbin-Watson test statistic can be used to determine the presence of Serial Correlation in multiple regression models, as well as simple and log linear time series models, but not on auto-regressive time series models.

Multi-collinearity

Two or more of your independent variables are highly correlated.

  • A tiny bit of multi-collinearity is tolerable and can be common in regression models involving several independent variables.
  • A common symptom of this problem is the presence of a high coefficient of determination (R2), despite having low t-scores for your independent variables (i.e. they are insignificant).

This content is for paid members only.

Join our membership for lifelong unlimited access to all our data science learning content and resources.