Parameter, Sample Statistic, and Frequency Distribution

Earlier we took an example of measuring the average savings for a group of 100 families. This data of 100 families represents the population. The properties of a population such as mean and standard deviation are called parameters.

Instead if we wanted to measure the average savings of an entire country, we would take a sample and then use that sample whose characteristics will be used to represent the entire population. These properties of sample are not called parameters, but sample statistics.

The statistical data that we collect can be presented in the form of a frequency distribution. A frequency distribution refers to summarizing a large data set into a small number of intervals.

Let’s take an example to understand how to construct a frequency distribution. Let’s say we have the following 20 observations with us.

1.5, 2.5, 3, 2.3, 4.3, 5.6, 4.2, 6.7, 5.9, 1.2, 5.4, 9.8, 8.5, 5.5, 2.9, 1.7, 8.8, 6.2, 9.5, 3.8

To construct a frequency distribution for this data, we will follow the following steps.

Step 1: Sort the data in ascending order

The sorted data is presented below:

1.2, 1.5, 1.7, 2.3, 2.5, 2.9, 3, 3.8, 4.2, 4.3, 5.4, 5.5, 5.6, 5.9, 6.2, 6.7, 8.5, 8.8, 9.5, 9.8

Step 2: Calculate the range of data

The range refers to the lower and upper limit for the data intervals. The minimum value is 1.2 and the maximum value is 9.8. This is the range of the data.

Step 3: Decide on the number of intervals in the frequency distribution

This must be done carefully so that the number of intervals is neither to high not too less. If we take very few intervals, our distribution will classify data very broadly. If we take too many intervals, then it won’t really be a summary of data.

In our example, we have 20 observations ranging from 1 to 10. If we use a value interval of 1, then we will have 9 intervals. For 20 values, 9 intervals are too many. So, we can decide to have 5 intervals with a width of 2.

Step 4: Determine the intervals.

Starting from 0, we can have 5 non-overlapping intervals as follows:

0 <= r < 2

2 <= r < 4

4 <= r < 6

6 <= r < 8

8 <= r < 10

Note that the above intervals are all-encompassing and non-overlapping. Each observation will fall under one (and only one) frequency interval.

Step 5: Tally and count the observations under each interval.

We will now assign each observation to one of these intervals and then count the total number of observations in each interval.

IntervalAbsolute Frequency
0 <= r < 23
2 <= r < 45
4 <= r < 66
6 <= r < 82
8 <= r < 104

Note that the interval 6-8 has the lowest frequency (number of observations) and interval 4-6 has the highest frequency. The frequency so calculated is referred to as Absolute Frequency.

Related Downloads

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.