Confidence Interval for a Population Mean, when the Distribution is Non-normal
When the distribution is normal, we use the z-statistic when the population variance is known and we use t-statistic when the population variance is unknown.
However, when the distribution is not normal, we cannot create a confidence interval if the sample size n<30
.
If sample size > 30
and the distribution is non-normal then:
- If population variance is known, we use z-statistic
- If population variance is unknown, we use t-statistic. Even z-statistic is acceptable, but t-statistic is more common.
Application in Finance
Let's understand this with stock market returns:
Case 1: Normal Distribution
We can analyze the daily returns of S&P 500 index. The returns approximate a normal distribution:
-
Known Population Variance:
- Let's say we have
historical volatility (σ) = 1% daily
Sample mean return = 0.05% daily
n = 25 days
- Here, we would use z-statistic since population variance is known
- Let's say we have
-
Unknown Population Variance:
- Let's take the same scenario but without known historical volatility
- In this case, we would use t-statistic with sample standard deviation
- This is more common in reality as true population variance is rarely known
Case 2: Non-Normal Distribution
Let's take one more example, this time using Bitcoin daily returns, which are typically non-normally distributed (showing high kurtosis and skewness):
-
If our sample size is small, say we're analyzing 20 days of returns
(n < 30)
:- We cannot create reliable confidence intervals
- We need to use non-parametric methods instead
-
If we're analyzing 60 days of returns (n > 30):
- We can create confidence intervals due to Central Limit Theorem
- We will use t-statistic as population variance is unknown
Key Statistical Tests
Before applying these methods, it's crucial to:
-
Test for normality using:
- Jarque-Bera test (common in finance)
- Shapiro-Wilk test
- Visual inspection of Q-Q plots
-
Consider sample size:
- Small samples require stricter assumptions
- Larger samples are more forgiving due to Central Limit Theorem