Sampling Methods in Statistics

It's almost next to impossible to survey large numbers - all citizens of a country, all women under 35, all business owners in a state - and therefore statisticians use samples with defined criteria to make observations and report findings.

Instead of collecting data from the entire population, a portion of the population is surveyed for any study. The observations from the sample are extrapolated for the population.

This is why it's important to select the correct sample and the correct questions to be asked to them. If this is not done, the study’s objectives will not be met or incorrect responses will be obtained.

Clear establishment of survey objectives, correct sample identification, the data that needs to be collected, and level of precision required in the study all contribute to sample selection.

In this article, we will learn about sampling methodology. Probability sampling and non-probability sampling are the two types of sampling. In probability sampling every element in the sample has the probability of being picked for the study, unlike non-probability sampling.

Probability Sampling

Here the sample is chosen randomly as one would in a lottery. There are different types of probability sampling.

Simple Random Sampling or SRS

In this type of sampling, all units in the sample have an equal chance of being selected. A ready list of units is used to select the sample. Names from a telephone directory, a voters list in a geographical area, social security card holders in a district could be used for a sample. Care is taken not to repeat units in the same to arrive at a unique randomized list. This method is useful when such lists are available. Otherwise it becomes expensive to formulate the list.

A supermarket plans to offer a discount on 15 items that they stock. They generate a list that contains 50 of the most picked up items for the same time period last year. Slips containing these 50 items are put in a box and then 15 of them are picked up. So the probability of any of the 50 items being picked up is sample size divided by population size or 15 /50, which is 0.3 or 3 in 10 chances of being picked.

Systematic Sampling

In this type of sampling, the units in the sample have a fixed interval. Let us look at a manufacturing line of spark plugs. The supervisor needs to test a batch for quality. He decides to take a sample size of 200. The total population of spark plugs produced in a day is 1000. So the sample interval will be Population/ Sample Size. That is 1000/200 which is equal to 5. The spark plugs will be picked up for testing with an interval of 5. The supervisor may choose to start with the first spark plug then the order of selection will be 1, 6, 11, 16, 21 and so on. If he chooses the second spark plug as the starting point then the order of selection is 2, 7, 12, 17 and so on.

This type of sampling is not truly randomized, except for the starting point.

Stratified Sampling

In this type of sampling, a homogenous population is stratified into layers. Each layer is incompatible with the other. Different sampling techniques can be used within these layers or strata.

This sampling works well in a group with diverse characteristics. Each unit in the population can be present only in one strata. Therefore, each strata is mutually exclusive. It allows for representation of all the diverse characteristics in the study.

Cluster Sampling

In this methods, the population under study is divided into clusters and a few of these clusters are selected to be the sample.

The government wants to study the impact of free mid-day meals provided to children in the age group 5 to 15 in government schools in the different areas of Bangalore. It is difficult to collect data from all schools spread over Bangalore.

So, the researchers select different schools across Bangalore that represent different areas, both urban and rural. They pick from these clusters so that they best represent the diversity and characteristics of the population.

If clusters are picked only from South Bangalore, the study will not provide correct findings. The clusters should be mutually exclusive.

Schools, districts, hospitals, industrial estates are ideal for cluster sampling as they are homogenous when compared to the population.

Cluster sampling makes it easy when the cost of surveying across vast geographical distances is prohibitive.

Multi-stage Sampling

This sampling is similar to cluster sampling but samples are selected by picking units within the cluster rather than the entire cluster. The first step is to select large clusters. Then using random sampling methods selections are made from within them. These clusters are called primary sampling units and secondary sampling units respectively. This process is repeated till smaller and smaller clusters are made available for sampling. This methodology works when observations and findings are required from large populations that are dispersed.

The Health Department wants to investigate and present the Covid-19 vaccination program among senior citizens in a state. They would select hospitals and primary health centers based on size to form clusters. This would be a two stage sampling. They could further select clusters from urban and rural areas. To deepen the study they may pick clusters of people who are immuno-compromised. In this way, the department gets a rich and diverse study.

Multi-phase Sampling

Here samples are chosen for the first level of data collection. Then, sub-groups are selected for further study. If we choose X participants from the clusters A, C, and D in the first phase, the next level of data collection will be from X-eliminated units in the clusters.

A group of 45-year-old women are selected for a study about general health among middle-aged women. The first stage involves answering a survey online. The researchers want to study more about their heart, blood pressure and liver by conducting and monitoring them with medical tests.

The researchers will collect data from 200 women in an area. Then 50 of these surveyed women are short-listed for the medical tests. This is an example of multi-phase sampling.

Non-probability Sampling

In non-probability sampling, the selection of the sample is subjective and not randomized. The caution that needs to be exercised is that the selection of the sample represents the population otherwise the results will be skewed. This method of surveying is inexpensive, less time consuming, and easier. Poor data quality, insufficient representation of population characteristics and reduced diversity due to lack of access are some of the shortcomings of this sampling method.

There are different methods of non-probability sampling. Let’s take a look at them.

Volunteer Sampling

Here researchers request for volunteers for the study through posters or a phone-call in. The conditions for volunteering are advertised and interested persons volunteer.

It may result in improper representation since volunteers who satisfy the criteria may choose not to volunteer.

Quota Sampling

In this type of sampling, each subpopulation within the population to be sampled is given a quota or assigned a number. These quotas meet the sampling size for that population. This enables inclusion and representation of the different subpopulations. If a sample size of a population of 200 students is to be undertaken then a sample size of 40 is considered. The researchers will assign a quota of 20 male students, and 20 female students and will conduct the sampling till this number is met.

Judgment Sampling

Here an expert decides which part of the population is going to be sampled, based on their knowledge and secondary reports of the population. It includes the experts' bias. The selection is solely done by the expert/s. This type of sampling is useful in focus group interviews.

Snowball Sampling

As the name suggests the sampling starts with contacting a few known individuals required for the sample. They are requested to help reach out to other individuals representing the sample. This type of sampling is done for hard to collect samples, or people within a specific organization or group whose information is not available publicly or easily.

Web Sampling

Web sampling refers to online sampling conducted through the internet. A panel of people who are prepared to respond to online surveys is used in this format. This is a method of self-selection and is used since telephone and postal surveys have a high non-response rate. They are inexpensive, uncomplicated, fast, and have higher response rates. Disadvantage of a web panel sampling is that it may not be truly representative of the population. It cannot include people who do not have access to the Internet.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $29 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.