Biases in Sampling

As we have seen in this chapter, all our estimates are based on the sample selected from the population. It is therefore critical that we choose the samples correctly so that our results are not biased. That said, there are many issues that could come in that would make our samples biased and lower the quality of parameter estimates. Let’s look at some of these:

Appropriate Sample Size

We learned that a larger sample size reduces sampling error. However, it has two problems:

  • When we increase the sample size, we are at the risk of choosing data that doesn’t represent the population correctly. This is especially the case with any time series data (such as returns from a mutual fund), because the population parameters may change over time.
  • A large sample size also increases our cost of sampling.

Data-mining Bias

Data mining involves mining or searching through historical data in an effort to find statistically significant patterns or trading strategies that work. One such pattern is the turn of the day effect – returns tended to be negative and lower on Monday as compared to the averages of the four other days of the week. Traders can make use of this information to outperform the market.

Data mining bias refers to errors that may occur from overly relying on the data mining practices. While some patterns observed from data mining may be useful but not all. Even the ones that were valid, they may not be valid in today’s market scenario. A trader is advised to investigate the data mining practices for their validity before forming their trading strategy. For example, the trader may want to conduct an out-of-sample test to find out whether the model works even in the periods that were not a part of the period considered for data mining.

Sample Selection Bias

This happens when you are not able to take a sample of certain part of the population because of non-availability of data. As a result the sample data selected is not completely random.

Survivorship Bias

Survivorship bias is a form of sample selection bias where certain financial databases tend to exclude information from the database. For example, where unsuccessful funds are removed from the index, the past index values are adjusted to remove the data of the dropped fund. Since a fund is more likely to be dropped from an index because of poor performance, such actions create bias in the index.

Look-ahead Bias

Look-ahead bias occurs when the researcher uses information that was not available on the test date while calculating an estimate. The researcher just assumes the information. In investment analysis, analysts may be using information that doesn’t exist at the time they are making investment decisions.

Time-period Bias

Time-period bias occurs when the test is based on a time period that may make the results time-period specific.

Related Downloads

Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.