# Biases in Sampling

As we have seen in this chapter, all our estimates are based on the sample selected from the population. It is therefore critical that we choose the samples correctly so that our results are not biased. That said, there are many issues that could come in that would make our samples biased and lower the quality of parameter estimates. Let’s look at some of these:

**Appropriate Sample Size**

We learned that a larger sample size reduces sampling error. However, it has two problems:

- When we increase the sample size, we are at the risk of choosing data that doesn’t represent the population correctly. This is especially the case with any time series data (such as returns from a mutual fund), because the population parameters may change over time.
- A large sample size also increases our cost of sampling.

**Data-mining Bias**

Data mining involves mining or searching through historical data in an effort to find statistically significant patterns or trading strategies that work. One such pattern is the turn of the day effect – returns tended to be negative and lower on Monday as compared to the averages of the four other days of the week. Traders can make use of this information to outperform the market.

Data mining bias refers to errors that may occur from overly relying on the data mining practices. While some patterns observed from data mining may be useful but not all. Even the ones that were valid, they may not be valid in today’s market scenario. A trader is advised to investigate the data mining practices for their validity before forming their trading strategy. For example, the trader may want to conduct an out-of-sample test to find out whether the model works even in the periods that were not a part of the period considered for data mining.

**Sample Selection Bias**

This happens when you are not able to take a sample of certain part of the population because of non-availability of data. As a result the sample data selected is not completely random.

**Survivorship Bias**

Survivorship bias is a form of sample selection bias where certain financial databases tend to exclude information from the database. For example, where unsuccessful funds are removed from the index, the past index values are adjusted to remove the data of the dropped fund. Since a fund is more likely to be dropped from an index because of poor performance, such actions create bias in the index.

**Look-ahead Bias**

Look-ahead bias occurs when the researcher uses information that was not available on the test date while calculating an estimate. The researcher just assumes the information. In investment analysis, analysts may be using information that doesn’t exist at the time they are making investment decisions.

**Time-period Bias**

Time-period bias occurs when the test is based on a time period that may make the results time-period specific.

- Simple Random Sampling and Sampling Distribution
- Sampling Error
- Stratified Random Sampling
- Time Series and Cross Sectional Data
- Central Limit Theorem
- Standard Error of the Sample Mean
- Parameter Estimation
- Point Estimates
- Confidence Interval Estimates
- Confidence Interval for a Population mean, with a known Population Variance
- Confidence Interval for a Population mean, with an Unknown Population Variance
- Confidence Interval for a Population Mean, when the Distribution is Non-normal
- Student’s t Distribution
- How to Read Student’s t Table
- Biases in Sampling

# R Programming Bundle: 25% OFF

**R Programming - Data Science for Finance Bundle**for just $29 $39.