- CFA L2: Quantitative Methods - Introduction
- Quants: Correlation Analysis
- Quants: Single Variable Linear Regression Analysis
- Standard Error of the Estimate or SEE
- Confidence Intervals (CI) for Dependent Variable Prediction
- Coefficient of Determination (R-Squared)
- Analysis of Variance or ANOVA
- Multiple Regression Analysis
- Multiple Regression and Coefficient of Determination (R-Squared)
- Fcalc – the Global Test for Regression Significance
- Regression Analysis and Assumption Violations
- Qualitative and Dummy Variables in Regression Modeling
- Time Series Analysis: Simple and Log-linear Trend Models
- Auto-Regressive (AR) Time Series Models
- Auto-Regressive Models - Random Walks and Unit Roots
- ARMA Models and ARCH Testing
- How to Select the Most Appropriate Time Series Model?

# Quants: Correlation Analysis

**Correlation**

Correlation is math-speak for relationships. Is there a relationship between the change in the value of one variable and the change in value of another?

Correlation and simple regression can help you:

Verify a relationship between dependent variable Y and independent variable X.

Identify the mathematical form of the relationship (ex. linear, exponential)

Determine the value of the y-intercept and the slope of the coefficient.

**Correlation Coefficient**

Correlation Coefficient = a range between -1 and 1

Determines the direction (positive or negative) and strength of the relationship (a value of zero indicates no relationship) between two variables.

Commonly expressed as “ryx”

This value must be tested for significance in order to determine if developing a single regression model is merited.

The null (Ho) hypothesis assumes that ryx \= 0 and no relationship exists.

Look at diagrams for a Student’s t-distribution to visualize your null hypothesis’ fail to reject and rejection ranges.

If you believe that you have found a relationship, then your hope is that the null will be rejected and the correlation coefficient is not equal to zero.

**Limits of Correlation Analysis**

The correlation coefficient assumes that the relationship is liner, but many relationships between two variables are non-linear.

If the data sample contains outlier observations, then rxy can be distorted.

The analyst may discover a high correlation when no real relationship exists (the relationship is spurious).

Data mining for relationships is not preferred to holding an actual theoretical basis for testing to identify potentially significant correlations.