Understanding Hypothesis Testing and p-value

Behavioral scientists, market researchers, astrophysicists, drug testers all seek to better understand the target group. Often it is next to impossible to assess the entire population. Inferential statistical testing is instead done on a sample that exhibits most if not all characteristics of the population. This is done using hypotheses testing.

Hypothesis (plural form being hypotheses) refers to a supposition which serves as the starting point for further exploration. Hypothesis testing states a ‘status quo’ hypothesis also known as the null hypothesis. The hypothesis that is the opposite or proposes another alternative is called the alternative hypothesis.

In hypotheses testing we start by assuming that the null hypothesis is in fact true.  We then try to find what is the probability that the null hypothesis is true. If the probability turns out to be very small then we can say that the null hypothesis is not true.

If, for example, a brand of beer wants to test if working men consume 3 beers or more on an average during a Saturday, in order to place more ads on Friday. It will first have to clearly state its claim or null hypothesis. Next a random sample from the population is collected. This could be, for example, 30 working men and the number of beers they consumed on a given Saturday. The mean of the same is calculated.

The sample mean is then compared to the supposition we have made. If it is found that the difference between the sample mean and population mean is too small then we accept the null hypothesis, which is that working men drink three or more beers on a Saturday. If the difference is large between the two we reject the null hypothesis.

The probability value that we get that helps us accept or reject the null hypotheses is called the p-value. A p-value less than 5% usually means the null hypothesis is to be rejected. In this context we refer to significance. A null hypothesis is rejected, since the p-value is less that 5%, we say significance has been reached. Alternatively when the p-value is more than 5%, the null hypotheses is retained and we say significance has not been reached. The result is not significant enough for it to be stated. A third alternative is that the p-value is at 5%. This means the hypothesis can go either which way. Clearly no conclusion can be derived from this.

In our case if the p-value was more than 5%, the beer brand will go ahead with more advertising on Fridays. If not they will continue with their current advertising plan. We will look at how to calculate p-value in another article.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book includes PDFs, explanations, instructions, data files, and R code for all examples.

Get the Bundle for $29 (Regular $57)
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Data Science in Finance: 9-Book Bundle

Data Science in Finance Book Bundle

Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.

What's Included:

  • Getting Started with R
  • R Programming for Data Science
  • Data Visualization with R
  • Financial Time Series Analysis with R
  • Quantitative Trading Strategies with R
  • Derivatives with R
  • Credit Risk Modelling With R
  • Python for Data Science
  • Machine Learning in Finance using Python

Each book comes with PDFs, detailed explanations, step-by-step instructions, data files, and complete downloadable R code for all examples.