Type I and Type II Errors

When drawing an inference (from a sample statistic, about a population parameter), there can be two types of errors: Type I and Type II.

Type I error, also known as error of the first kind, occurs when the null hypothesis is true, but is rejected.

Type II error, also known as the error of the second kind, occurs when the null hypothesis is false, but is accepted as true.

When we conduct a significance test, we first define the null hypothesis (H0) and the alternative hypothesis (Ha). This is done with reference to a population. The null hypothesis generally refers to a generally accepted assumption about the population parameter. The alternative hypothesis is the alternative to the null hypothesis. The objective of a hypothesis test is to reject the null hypothesis, which is to say that the alternative hypothesis is supported by the data. The conclusion of the hypothesis test will be that we either reject the null hypothesis or we fail to reject the null hypothesis.

To perform the test, we take a sample from the population, and using this sample we calculate the test statistic which is used to make a decision about whether the null hypothesis should be rejected or not. This test statistic is a function of the sample data.

We want to calculate the probability of getting this statistic, assuming out null hypothesis is true. If this value, also known as p-value, is below a certain threshold (significance level), then we reject the null hypothesis.

p-value < α => Reject H0

p-value >= α => Fail to Reject H0

If the significance level is 5%, then what we are saying is that if the p-value (probability of getting this statistic from a certain sample size, assuming null hypothesis is true) is less than the threshold of 5%, then it's reasonable to reject the null hypothesis.

However in reality, we may be wrong and this scenario may not really hold true. In such a situation we will see the Type I and Type II errors. This is possible because we are conducting out significance test based on a sample and not on the entire population.

The following table provides a clear view of the Type I and Type II errors.

Null HypothesisTrueFalse
RejectedType I ErrorCorrect
Fail to RejectCorrectType II Error

There are four scenarios:

Scenario 1: In reality, the null hypothesis is true, but we reject it. This is called the Type I error, or False positive. The probability of getting a type I error is equal to our significance level (α)

Scenario 2: In reality, the null hypothesis is false, and through our test, we reject it. This is a correct conclusion of the hypothesis test.

Scenario 3: In reality, the null hypothesis is true, and we accept it. This also is a correct conclusion of the hypothesis test.

Scenario 2: In reality, the null hypothesis is false, and through our test, we fail to reject it. This is called the Type II error, or False negative.

As an example, let's say Company A produces electric switches, but 5% of them are defective. Company B claims that they produce fewer defective switches. The null and alternative hypothesis will be stated as follows:

H0: p = 0.05 versus Ha: p<0.05

To test the hypothesis the company may use a sample of 200 switches from company B to calculate out test statistic.

Company B will commit a Type I error, if it rejects the null hypothesis and concludes that they make fewer than 5% defective switches even though in reality they make 5% defective switches.

Company B will commit a Type II error, if it accepts the null hypothesis and concludes that they make 5% defective switches even though in reality they make fewer than 5% defective switches.

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.