- Simple Random Sampling and Sampling Distribution
- Sampling Error
- Stratified Random Sampling
- Time Series and Cross Sectional Data
- Central Limit Theorem
- Standard Error of the Sample Mean
- Parameter Estimation
- Point Estimates
- Confidence Interval Estimates
- Confidence Interval for a Population mean, with a known Population Variance
- Confidence Interval for a Population mean, with an Unknown Population Variance
- Confidence Interval for a Population Mean, when the Distribution is Non-normal
- Student’s t Distribution
- How to Read Student’s t Table
- Biases in Sampling

# Biases in Sampling

As we have seen in this chapter, all our estimates are based on the sample selected from the population. It is therefore critical that we choose the samples correctly so that our results are not biased. That said, there are many issues that could come in that would make our samples biased and lower the quality of parameter estimates. Let’s look at some of these:

**Appropriate Sample Size**

We learned that a larger sample size reduces sampling error. However, it has two problems:

- When we increase the sample size, we are at the risk of choosing data that doesn’t represent the population correctly. This is especially the case with any time series data (such as returns from a mutual fund), because the population parameters may change over time.
- A large sample size also increases our cost of sampling.

**Data-mining Bias**

Data mining involves mining or searching through historical data in an effort to find statistically significant patterns or trading strategies that work. One such pattern is the turn of the day effect – returns tended to be negative and lower on Monday as compared to the averages of the four other days of the week. Traders can make use of this information to outperform the market.

Data mining bias refers to errors that may occur from overly relying on the data mining practices. While some patterns observed from data mining may be useful but not all. Even the ones that were valid, they may not be valid in today’s market scenario. A trader is advised to investigate the data mining practices for their validity before forming their trading strategy. For example, the trader may want to conduct an out-of-sample test to find out whether the model works even in the periods that were not a part of the period considered for data mining.

**Sample Selection Bias**

This happens when you are not able to take a sample of certain part of the population because of non-availability of data. As a result the sample data selected is not completely random.

# This content is for paid members only.

Join our membership for lifelong unlimited access to all our data science learning content and resources.