Lessons

Paramteric vs Non-Parametric Distributions

Definitions

Parametric Distribution: A parametric distribution is used in statistics when an assumption is made of the way the underlying data is distributed. An example would be when a variable is assumed to be normally distributed. All subsequent analysis will then rely on this assumption. The parameters associated with this assumption like mean and standard deviation also contribute to the analysis.

Non-Parametric Distribution: This class of distributions is used in cases where assumptions about the pattern or form of the underlying probability distribution from which the data are drawn are not needed. Typically these are used in cases where an attribute of the population needs description; its relationship with another attribute needs to be determined and/or differences on that attribute across population, time or related constructs needs to be derived without an underlying population being distributed in a certain form or requiring interval level measurement. An example of this type of test is the Wilcoxon rank-sum test.

This applies to situations where very weak assumptions have been made about the actual form of the distribution of observations.

Applications

In the field of statistics sampling is most commonly used as it is almost impossible to include each and every member of the population under consideration. A population typically is composed of a very large number of observations.

There are real-world scenarios where the sample set is very small in size for example less than 30. This as well the fact that it is diifficult to determine a pattern associated with the distribution and interval scale measurement has not been made justifies the use of non-parametric distributions for analysis purposes. This is combined with a very desirable characteristic that very few assumptions about the pattern of behaviour are made in this method.

Parameteric methods should only be used when the assumtions about the distribution of the underlying are well met and any violation justifies the use of non-parameteric methods. Whenever there is convergence between the results of parametric and non-parametric analysis the former can be used. In practice most research questions are bivariate and if the bivariate results of both types of tests converge then it would be good to use parametric techniques.

Comparison of Results

The results of the non-parametric tests are more difficult to interpret than that of the parametric tests.

Whenever the underlying assumption about the distribution of the population holds parametric tests necessarily produce more accurate results than non-paramteric methods.

An Example

Patients in a hospital have been classified based on their gender and their duration of stay needs to be compared. The distribution for females is strongly skewed while that of males is not. The median of both shows strong convergence whereas the means show strong inequality. The parametric test is less suitable in this case as the assumption of normality is not reasonable and a non-parametric approach is more advisable.

Next Lesson

Membership

Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.

I WANT TO JOIN

JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Paramteric vs Non-Parametric Distributions

Free Guides - Getting Started with R and Python

Take the Next Step in Your Data Career