Understanding Hypothesis Testing and p-value
Behavioral scientists, market researchers, astrophysicists, drug testers all seek to better understand the target group. Often it is next to impossible to assess the entire population. Inferential statistical testing is instead done on a sample that exhibits most if not all characteristics of the population. This is done using hypotheses testing.
Hypothesis (plural form being hypotheses) refers to a supposition which serves as the starting point for further exploration. Hypothesis testing states a ‘status quo’ hypothesis also known as the null hypothesis. The hypothesis that is the opposite or proposes another alternative is called the alternative hypothesis.
In hypotheses testing we start by assuming that the null hypothesis is in fact true. We then try to find what is the probability that the null hypothesis is true. If the probability turns out to be very small then we can say that the null hypothesis is not true.
If, for example, a brand of beer wants to test if working men consume 3 beers or more on an average during a Saturday, in order to place more ads on Friday. It will first have to clearly state its claim or null hypothesis. Next a random sample from the population is collected. This could be, for example, 30 working men and the number of beers they consumed on a given Saturday. The mean of the same is calculated.
The sample mean is then compared to the supposition we have made. If it is found that the difference between the sample mean and population mean is too small then we accept the null hypothesis, which is that working men drink three or more beers on a Saturday. If the difference is large between the two we reject the null hypothesis.
The probability value that we get that helps us accept or reject the null hypotheses is called the p-value. A p-value less than 5% usually means the null hypothesis is to be rejected. In this context we refer to significance. A null hypothesis is rejected, since the p-value is less that 5%, we say significance has been reached. Alternatively when the p-value is more than 5%, the null hypotheses is retained and we say significance has not been reached. The result is not significant enough for it to be stated. A third alternative is that the p-value is at 5%. This means the hypothesis can go either which way. Clearly no conclusion can be derived from this.
In our case if the p-value was more than 5%, the beer brand will go ahead with more advertising on Fridays. If not they will continue with their current advertising plan. We will look at how to calculate p-value in another article.
This content is for paid members only.
Join our membership for lifelong unlimited access to all our data science learning content and resources.