Earlier we took an example of measuring the average savings for a group of 100 families. This data of 100 families represents the population. The properties of a population such as mean and standard deviation are called parameters.
Instead if we wanted to measure the average savings of an entire country, we would take a sample and then use that sample whose characteristics will be used to represent the entire population. These properties of sample are not called parameters, but sample statistics.
The statistical data that we collect can be presented in the form of a frequency distribution. A frequency distribution refers to summarizing a large data set into a small number of intervals.
Let’s take an example to understand how to construct a frequency distribution. Let’s say we have the following 20 observations with us.
1.5, 2.5, 3, 2.3, 4.3, 5.6, 4.2, 6.7, 5.9, 1.2, 5.4, 9.8, 8.5, 5.5, 2.9, 1.7, 8.8, 6.2, 9.5, 3.8
To construct a frequency distribution for this data, we will follow the following steps.
Step 1: Sort the data in ascending order
The sorted data is presented below:
1.2, 1.5, 1.7, 2.3, 2.5, 2.9, 3, 3.8, 4.2, 4.3, 5.4, 5.5, 5.6, 5.9, 6.2, 6.7, 8.5, 8.8, 9.5, 9.8
Step 2: Calculate the range of data
The range refers to the lower and upper limit for the data intervals. The minimum value is 1.2 and the maximum value is 9.8. This is the range of the data.
Step 3: Decide on the number of intervals in the frequency distribution
This must be done carefully so that the number of intervals is neither to high not too less. If we take very few intervals, our distribution will classify data very broadly. If we take too many intervals, then it won’t really be a summary of data.
In our example, we have 20 observations ranging from 1 to 10. If we use a value interval of 1, then we will have 9 intervals. For 20 values, 9 intervals are too many. So, we can decide to have 5 intervals with a width of 2.
Step 4: Determine the intervals.
Starting from 0, we can have 5 non-overlapping intervals as follows:
0 <= r < 2
2 <= r < 4
4 <= r < 6
6 <= r < 8
8 <= r < 10
Note that the above intervals are all-encompassing and non-overlapping. Each observation will fall under one (and only one) frequency interval.
Step 5: Tally and count the observations under each interval.
We will now assign each observation to one of these intervals and then count the total number of observations in each interval.
|0 <= r < 2||3|
|2 <= r < 4||5|
|4 <= r < 6||6|
|6 <= r < 8||2|
|8 <= r < 10||4|
Note that the interval 6-8 has the lowest frequency (number of observations) and interval 4-6 has the highest frequency. The frequency so calculated is referred to as Absolute Frequency.