Summarization of the data is a necessary function of any statistical analysis. As a first step in this direction, the huge mass of unwieldy data is summarized in the form of tables and frequency distributions.
In order to bring the characteristics of the data into sharp focus, these tables and frequency distributions need to be summarized further. A measure of central tendency or an average is very essential and an important summary measure in any statistical analysis. An average is a single value which can be taken as representative of the whole distribution.
DEFINITION OF AVERAGE
The average of a distribution has been defined in various ways. Some of the important definitions are :
"An average is an attempt to find one single figure to describe the whole of figures" - Clark and Sekkade
"Average is a value which is typical or representative of a set of data" - Murray R. Spiegal
"An average is a single value within the range of the data that is used to represent all the values in the series. Since an average is somewhere within the range of data it is sometimes called a measure of central value" - Croxton and Cowden
"A measure of central tendency is a typical value around which other figures congregate" - Sipson and Kafka
FUNCTIONS AND CHARACTERSTICS OF AN AVERAGE
To present huge mass of data in a summarized form: It is very difficult for human mind to grasp a large body of numerical figures. A measure of average is used to summarize such data into a single figure which makes it easier to understand and remember.
To facilitate comparison: Different sets of data can be compared by comparing their averages. For example, the level of wages of workers in two factories can be compared by mean (or average) wages of workers in each of them.
To help in decision-making: Most of the decisions to be taken in research, planning, etc., are based on the average value of certain variables. For example, if the average monthly sales of a company are falling, the sales manager may have to take certain decisions to improve it.
Characteristics of a Good Average
A good measure of average must posses the following characteristics:
It should be rigidly defined, preferably by an algebraic formula, so that different persons obtain the same value for a given set of data
It should be easy to compute.
It should be easy to understand.
It should be based on all the observations.
It should be capable of further algebraic treatment.
It should not be unduly affected by extreme observations.
It should not be much affected by the fluctuations of sampling.
VARIOUS MEASURES OF AVERAGE
Various measures of average can be classified into the following three categories:
Arithmetic Mean or Mean
The above measures of central tendency will be discussed in the order of their popularity. Out of these, the Arithmetic Mean, Median and Mode, being most popular, are discussed in that order.
Before the discussion of arithmetic mean, we shall introduce certain notations. It will be assumed that there are n observations whose values are denoted by X1,X2, ..... Xn respectively. The sum of these observations X1 + X2 + ..... + Xn will be denoted in abbreviated form as
where S (called sigma) denotes summation sign.
The subscript of X, i.e., 'i' is a positive integer, which indicates the serial number of the observation. Since there are n observations, variation in i will be from 1 to n. This is indicated by writing it below and above S, as written earlier. When there is no ambiguity in range of summation, this indication can be skipped and we may simply write X1 + X2 + ..... + Xn = SXi.
Arithmetic Mean is defined as the sum of observations divided by the number of observations. It can be computed in two ways:
Simple arithmetic mean and
weighted arithmetic mean.
In case of simple arithmetic mean, equal importance is given to all the observations while in weighted arithmetic mean, the importance given to various observations is not same.
Calculation of Simple Arithmetic Mean
(a) When Individual Observations are given.
Let there be n observations X1, X2 ..... Xn. Their arithmetic mean can be calculated either by direct method or by short cut method. The arithmetic mean of these observations will be denoted by X
Direct Method: Under this method, X is obtained by dividing sum of observations by number of observations, i.e.,
Short-cut Method: This method is used when the magnitude of individual observations is large. The use of short-cut method is helpful in the simplification of calculation work. Let A be any assumed mean. We subtract A from every observation. The difference between an observation and A, i.e., Xi - A is called the deviation of i th observation from A and is denoted by di. Thus, we can write ; d1 = X1 - A, d2 = X2 - A, ..... dn = Xn - A. On adding these deviations and dividing by n we get
This result can be used for the calculation of X .
Remarks: Theoretically we can select any value as assumed mean. However, for the purpose of simplification of calculation work, the selected value should be as nearer to the value of X as possible.
Example : The following figures relate to monthly output of cloth of a factory in a given year
(c) When data are in the form of a grouped frequency distribution
In a grouped frequency distribution, there are classes along with their respective frequencies. Let li be the lower limit and ui be the upper limit of ith class. Further, let the number of classes be n, so that i = 1, 2,.....n. Also let fi be the frequency of ith class. This distribution can written in tabular form, as shown.
Note: Here u1 may or may not be equal to l2, i.e., the upper limit of a class may or may not be equal to the lower limit of its following class.
It may be recalled here that, in a grouped frequency distribution, we only know the number of observations in a particular class interval and not their individual magnitudes. Therefore, to calculate mean, we have to make a fundamental assumption that the observations in a class are uniformly distributed.
Under this assumption, the mid-value of a class will be equal to the mean of observations in that class and hence can be taken as their representative. Therefore, if Xi is the mid-value of i th class with frequency fi , the above assumption implies that there are fi observations each with magnitude Xi (i = 1 to n). Thus, the arithmetic mean of a grouped frequency distribution can also be calculated by the use of the formula, given in § below.
Remarks: The accuracy of arithmetic mean calculated for a grouped frequency distribution depends upon the validity of the fundamental assumption. This assumption is rarely met in practice. Therefore, we can only get an approximate value of the arithmetic mean of a grouped frequency distribution.
Example : Calculate arithmetic mean of the following distribution :
Solution: Here only short-cut method will be used to calculate arithmetic mean but it can also be calculated by the use of direct-method
Example : The following table gives the distribution of weekly wages of workers in a factory. Calculate the arithmetic mean of the distribution.
Solution: It may be noted here that the given class intervals are inclusive. However, for the computation of mean, they need not be converted into exclusive class intervals.
Step deviation method or coding method
In a grouped frequency distribution, if all the classes are of equal width, say 'h', the successive mid-values of various classes will differ from each other by this width. This fact can be utilised for reducing the work of computations.
Weighted Arithmetic Mean
In the computation of simple arithmetic mean, equal importance is given to all the items. But this may not be so in all situations. If all the items are not of equal importance, then simple arithmetic mean will not be a good representative of the given data. Hence, weighing of different items becomes necessary. The weights are assigned to different items depending upon their importance, i.e., more important items are assigned more weight.
For example, to calculate mean wage of the workers of a factory, it would be wrong to compute simple arithmetic mean if there are a few workers (say managers) with very high wages while majority of the workers are at low level of wages. The simple arithmetic mean, in such a situation, will give a higher value that cannot be regarded as representative wage for the group. In order that the mean wage gives a realistic picture of the distribution, the wages of managers should be given less importance in its computation.
The mean calculated in this manner is called weighted arithmetic mean. The computation of weighted arithmetic is useful in many situations where different items are of unequal importance, e.g., the construction index numbers, computation of standardized death and birth rates, etc.
Merits and Demerits of Arithmetic Mean
It is rigidly defined.
It is easy to calculate and simple to follow.
It is based on all the observations.
It is determined for almost every kind of data.
It is finite and not indefinite.
It is readily put to algebraic treatment.
It is least affected by fluctuations of sampling.
The arithmetic mean is highly affected by extreme values.
It cannot average the ratios and percentages properly.
It is not an appropriate average for highly skewed distributions.
It cannot be computed accurately if any item is missing.
The mean sometimes does not coincide with any of the observed value.
Exercise with Hints
The frequency distribution of weights in grams of mangoes of a given variety is given below. Calculate the arithmetic mean.
Hint : Take the mid-value of a class as the mean of its limits and find arithmetic mean by the step-deviation method.
The following table gives the monthly income (in rupees) of families in a certain locality. By stating the necessary assumptions, calculate arithmetic mean of the distribution.
Hint : This distribution is with open end classes. To calculate mean, it is to be assumed that the width of first class is same as the width of second class. On this assumption the lower limit of the first class will be 0. Similarly, it is assumed that the width of last class is equal to the width of last but one class. Therefore, the upper limit of the last class can be taken as 6,000.
Compute arithmetic mean of the following distribution of marks in Economics of 50 Students
Hint: First convert the distribution into class intervals and then calculate X .
The monthly profits, in '000 rupees, of 100 shops are distributed as follows:
Find average profit per shop.
Hint: This is a less than type cumulative frequency distribution.
Typist A can type a letter in five minutes, typist B in ten minutes and typist C in fifteen minutes. What is the average number of letters typed per hour per typist? Hint: In one hour, A will type 12 letters, B will type 6 letters and C will type 4 letters.
A taxi ride in Delhi costs Rs 5 for the first kilometre and Rs 3 for every additional kilometre travelled. The cost of each kilometre is incurred at the beginning of the kilometre so that the rider pays for the whole kilometre. What is the average cost of travelling 2 3/ 4 kilometres? Hint: Total cost of travelling 2*3/ 4 kilometres = Rs 5 + 3 + 3 = Rs 11.
A company gave bonus to its employees. The rates of bonus in various salary groups are :
The actual salaries of staff members are as given below :
Determine (i) Total amount of bonus paid and (ii) Average bonus paid per employee.
Hint: Find the frequencies of the classes from the given information.
Calculate arithmetic mean from the following distribution of weights of 100 students of a college. It is given that there is no student having weight below 90 lbs. and the total weight of persons in the highest class interval is 350 lbs.
Hint: Rearrange this in the form of frequency distribution by taking class intervals as 90 - 100, 100 - 110, etc.
By arranging the following information in the form of a frequency distribution, find arithmetic mean. "In a group of companies 15%, 25%, 40% and 75% of them get profits less than Rs 6 lakhs, 10 lakhs, 14 lakhs and 20 lakhs respectively and 10% get Rs 30 lakhs or more but less than 40 lakhs."
Hint: Take class intervals as 0 - 6, 6 - 10, 10 - 14, 14 - 20, etc.
Find class intervals if the arithmetic mean of the following distribution is 38.2 and the assumed mean is equal to 40.
Hint: Use the formula X = A + å fu/ N× h to find the class width h.
From the following data, calculate the mean rate of dividend obtainable to an investor holding shares of various companies as shown :
Hint: The no. of shares of each type = no. of companies ´ average no. of shares.
The mean weight of 150 students in a certain class is 60 kgs. The mean weight of boys in the class is 70 kgs and that of girls is 55 kgs. Find the number of girls and boys in the class.
Hint: Take n1 as the no. of boys and 150 - n1 as the no. of girls.
The mean wage of 100 labourers working in a factory, running two shifts of 60 and 40 workers respectively, is Rs 38. The mean wage of 60 labourers working in the morning shift is Rs 40. Find the mean wage of 40 laboures working in the evening shift.
Hint: See example above.
The mean of 25 items was calculated by a student as 20. If an item 13 is replaced by 30, find the changed value of mean.
Hint: See example above.
The average daily price of share of a company from Monday to Friday was Rs 130. If the highest and lowest price during the week were Rs 200 and Rs 100 respectively, find average daily price when the highest and lowest price are not included.
Hint: See previous example.
The mean salary paid to 1000 employees of an establishment was found to be Rs 180.40. Later on, after disbursement of the salary, it was discovered that the salaries of two employees were wrongly recorded as Rs 297 and Rs 165 instead of Rs 197 and Rs 185. Find the correct arithmetic mean.
Hint: See previous example.
Find the missing frequencies of the following frequency distribution:
Hint: See above example.
Marks obtained by students who passed a given examination are given below:
If 100 students took the examination and their mean marks were 51, calculate the mean marks of students who failed.
Hint: See above example .
A appeared in three tests of the value of 20, 50 and 30 marks respectively. He obtained 75% marks in the first and 60% marks in the second test. What should be his percentage of marks in the third test in order that his aggregate is 60%?
Hint: Let x be the percentage of marks in third test. Then the weighted average of 75, 60 and x should be 60, where weights are 20, 50 and 30 respectively.
Price of a banana is 80 paise and the price of an orange is Rs 1.20. If a person purchases two dozens of bananas and one dozen of oranges, show by stating reasons that the average price per piece of fruit is 93 paise and not one rupee.
Hint: Correct average is weighted arithmetic average.
The average marks of 39 students of a class is 50. The marks obtained by 40th student are 39 more than the average marks of all the 40 students. Find mean marks of all the 40 students
The following table gives the distribution of the number of kilometres travelled per salesman, of a pharmaceutical company, per day and their rates of conveyance allowance:
Calculate the average rate of conveyance allowance given to each salesman per kilometre by the company.
Hint: Obtain total number of kilometre travelled for each rate of conveyance allowance by multiplying mid-values of column 1 with column 2. Treat this as frequency 'f' and third column as 'X' and find X .
The details of monthly income and expenditure of a group of five families are given in the following table
Find: (i) Average income per member for the entire group of families.
(ii) Average expenditure per family.
(iii) The difference between actual and average expenditure for each family. Hint: (i) Average income per member = Total income of the group of families/Total no of members in the group
(ii) Average expenditure per family = Total expenditure of the group/No of families
The following table gives distribution of monthly incomes of 200 employees of a firm:
(i) Mean income of an employee per month.
(ii) Monthly contribution to welfare fund if every employee belonging to the top 80% of the earners is supposed to contribute 2% of his income to this fund.
Hint: The distribution of top 80% of the wage earners can be written as
By taking mid-values of class intervals find Sfx, i.e., total salary and take 2% of this.
The number of patients visiting diabetic clinic and protein urea clinic in a hospital during April 1991, are given below :
Which of these two diseases has more incidence in April 1991? Justify your conclusion.
Hint: The more incidence of disease is given by higher average number of patients.
A company has three categories of workers A, B and C. During 1994, the number of workers in respective category were 40, 240 and 120 with monthly wages Rs 1,000, Rs 1,300 and Rs 1,500. During the following year, the monthly wages of all the workers were increased by 15% and their number, in each category, were 130, 150 and 20, respectively.
(a) Compute the average monthly wages of workers for the two years.
(b) Compute the percentage change of average wage in 1995 as compared with 1994. Is it equal to 15%? Explain.
Hint: Since the weight of the largest wage is less in 1995, the increase in average wage will be less than 15%.
(a) The average cost of producing 10 units is Rs 6 and the average cost of producing 11 units is Rs 6.5. Find the marginal cost of the 11th unit.
(b) A salesman is entitled to bonus in a year if his average quarterly sales are at least Rs 40,000. If his average sales of the first three quarters is Rs 35,000, find his minimum level of sales in the fourth quarter so that he becomes eligible for bonus.
Hint: See above example .
(a) The monthly salaries of five persons were Rs 5,000, Rs 5,500, Rs 6,000, Rs 7,000 and Rs 20,000. Compute their mean salary. Would you regard this mean as typical of the salaries? Explain.
(b) There are 100 workers in a company out of which 70 are males and 30 females. If a male worker earns Rs 100 per day and a female worker earns Rs. 70 per day, find average wage. Would you regard this as a typical wage? Explain Hint: An average that is representative of most of the observations is said to be a typical average.
The median of a set of data values is the middle value of the data set when it has been arranged in ascending order. That is, from the smallest value to the highest value.
Median of distribution is that value of the variate which divides it into two equal parts. In terms of frequency curve, the ordinate drawn at median divides the area under the curve into two equal parts. Median is a positional average because its value depends upon the position of an item and not on its magnitude.
Determination of Median
(a) When individual observations are given
The following steps are involved in the determination of median:
(i) The given observations are arranged in either ascending or descending order of magnitude.
(ii) Given that there are n observations, the median is given by:
The size of n+1/2 th observations, when n is odd.
The mean of the sizes of n/2th and n+1/2 of observations, when n is even.
Example : Find median of the following observations:
20, 15, 25, 28, 18, 16, 30.
Solution: Writing the observations in ascending order, we get 15, 16, 18, 20, 25, 28, 30.
Since n = 7, i.e., odd, the median is the size of 7+1/2 th, i.e., 4th observation.
Hence, median, denoted by Md = 20.
Note: The same value of Md will be obtained by arranging the observations in descending order of magnitude.
Example : Find median of the data : 245, 230, 265, 236, 220, 250.
Solution: Arranging these observations in ascending order of magnitude, we get 220, 230, 236, 245, 250, 265. Here n = 6, i.e., even.
Median will be arithmetic mean of the size of 6/2th, i.e., 3rd and (6/2 +1)th, i.e., 4th observations. Hence Md =236+245/2=240.5
Remarks: Consider the observations: 13, 16, 16, 17, 17, 18, 19, 21, 23. On the basis of the method given above, their median is 17.
According to the above definition of median, "half (i.e., 50%) of the observations should be below 17 and half of the observations should be above 17". Here we may note that only 3 observations are below 17 and 4 observations are above it and hence, the definition of median given above is some what ambiguous. In order to avoid this ambiguity, the median of a distribution may also be defined in the following way:
Median of a distribution is that value of the variate such that at least half of the observations are less than or equal to it and at least half of the observations are greater than or equal to it.
Based on this definition, we find that there are 5 observations which are less than or equal to 17 and there are 6 observations which are greater than or equal to 17. Since n = 9, the numbers 5 and 6 are both more than half, i.e., 4.5. Thus, median of the distribution is 17.
Further, if the number of observations is even and the two middle most observations are not equal, e.g., if the observations are 2, 2, 5, 6, 7, 8, then there are 3 observations (n/2=3) which are less than or equal to 5 and there are 4 (i.e., more than half) observations which are greater than or equal to 5.
Further, there are 4 observations which are less than or equal to 6 and there are 3 observations which are greater than or equal to 6. Hence, both 5 and 6 satisfy the conditions of the new definition of median. In such a case, any value lying in the closed interval [5, 6] can be taken as median. By convention we take the middle value of the interval as median. Thus, median is 5+6/2= 5.5
(b) When ungrouped frequency distribution is given
In this case, the data are already arranged in the order of magnitude. Here, cumulative frequency is computed and the median is determined in a manner similar to that of individual observations.
Example : Locate median of the following frequency distribution:
Here N = 95, which is odd. Thus, median is size of (95+1/2)th i.e.,48th observation. From the table 48th observation is 12, Therefore Md = 12.
N/2 = 95/2 = 47.5 Looking at the frequency distribution we note that there are 48 observations which are less than or equal to 12 and there are 72 (i.e., 95 - 23) observations which are greater than or equal to 12. Hence, median is 12.
Example : Locate median of the following frequency distribution :
Here N = 252, i.e., even.
N/2= 252/2 = 126 and N/2+1 = 127
Median is the mean of the size of 126th and 127th observation. From the table we note that 126th observation is 4 and 127th observation is 5.
Md = 4+5/2= 4.5
Alternative Method: Looking at the frequency distribution we note that there are 126 observations which are less than or equal to 4 and there are 252 - 75 = 177 observations which are greater than or equal to 4. Similarly, observation 5 also satisfies this criterion. Therefore, median = 4+5/2 = 4.5.
(c) When grouped frequency distribution is given (Interpolation formula)
The determination of median, in this case, will be explained with the help of the following example.
Example : Suppose we wish to find the median of the following frequency distribution.
Solution: The median of a distribution is that value of the variate which divides the distribution into two equal parts. In case of a grouped frequency distribution, this implies that the ordinate drawn at the median divides the area under the histogram into two equal parts. Writing the given data in a tabular form, we have:
For the location of median, we make a histogram with heights of different rectangles equal to frequency density of the corresponding class. Such a histogram is shown below:
Since the ordinate at median divides the total area under the histogram into two equal parts, therefore we have to find a point (like Md as shown in the figure) on X - axis such that an ordinate (AMd) drawn at it divides the total area under the histogram into two equal parts.
We may note here that area under each rectangle is equal to the frequency of the corresponding class.
Since area = length ´ breadth = frequency density× width of class = f/h× h = f.
Thus, the total area under the histogram is equal to total frequency N. In the given example N = 70, therefore N/2= 35. We note that area of first three rectangles is 5 + 12 + 14 = 31 and the area of first four rectangles is 5 + 12 + 14 + 18 = 49. Thus, median lies in the fourth class interval which is also termed as median class. Let the point, in median class, at which median lies be denoted by Md.
The position of this point should be such that the ordinate AMd (in the above histogram) divides the area of median rectangle so that there are only 35 - 31 = 4 observations to its left. From the histogram, we can also say that the position of Md should be such that
Thus, Md=40/18 + 30 =32.2
Writing the above equation in general notations, we have
Where, Lm is lower limit, h is the width and fm is frequency of the median class and C is the cumulative frequency of classes preceding median class. Equation (2) gives the required formula for the computation of median.
1. Since the variable, in a grouped frequency distribution, is assumed to be continuous we always take exact value of N/2, including figures after decimals, when N is odd.
2. The above formula is also applicable when classes are of unequal width.
3. Median can be computed even if there are open end classes because here we need to know only the frequencies of classes preceding or following the median class.
Determination of Median When 'greater than' type cumulative frequencies are given
By looking at the histogram, we note that one has to find a point denoted by Md such that area to the right of the ordinate at Md is 35. The area of the last two rectangles is 13 + 8 = 21. Therefore, we have to get 35 - 21 = 14 units of area from the median rectangle towards right of the ordinate. Let Um be the upper limit of the median class. Then the formula for median in this case can be written as
Note that C denotes the 'greater than type' cumulative frequency of classes following the median class. Applying this formula to the above example, we get
Md=40-(35-21)/18 x 10 = 32.2
Example: Calculate median of the following data :
Since N/2=100/2= 50, the median class is 7- 8. Further, Lm = 7, h = 1, fm = 22 and C = 38.
Thus, Md = 7 +50-38/22x1 = 7.55 inches
Example : The following table gives the distribution of marks by 500 students in an examination. Obtain median of the given data.
Solution: Since the class intervals are inclusive, therefore, it is necessary to convert
them into class boundaries.
Since N/2 = 250, the median class is 49.5 - 59.5 and, therefore, Lm = 49.5, h = 10, fm= 162, C=192
Thus Md= 49.5 + 250-192/162 x10 = 53.08 marks
Example: The weekly wages of 1,000 workers of a factory are shown in the following table. Calculate median.
Solution: The above is a 'less than' type frequency distribution. This will first be converted into class intervals
Since N/2= 500, the median class is 625 - 675. On substituting various values in the formula for median, we get
Example : Find the median of the following data:
Solution: Note that it is 'greater than' type frequency distribution
Since N/2=230/2= 115, the median class is 40 - 50.
Example : The following table gives the daily profits (in Rs) of 195 shops of a town. Calculate mean and median.
Example : Find median of the following distribution:
Solution: Since the mid-values are equally spaced, the difference between their two successive values will be the width of each class interval. This width is 1,000. On subtracting and adding half of this, i.e., 500 to each of the mid-values, we get the lower and the upper limits of the respective class intervals. After this, the calculation of median can be done in the usual way.
Determination of Missing Frequencies
If the frequencies of some classes are missing, however, the median of the distribution is known, and then these frequencies can be determined by the use of median formula.
Example : The following table gives the distribution of daily wages of 900 workers. However, the frequencies of the classes 40 - 50 and 60 - 70 are missing. If the median of the distribution is Rs 59.25, find the missing frequencies.
Solution: Let f1 and f2 be the frequencies of the classes 40 - 50 and 60 - 70 respectively.
Since median is given as 59.25, the median class is 50 - 60. Therefore, we can write
Graphical location of Median
So far we have calculated median by the use of a formula. Alternatively, it can be determined graphically, as illustrated in the following example.
Example : The following table shows the daily sales of 230 footpath sellers of Chandni Chowk:
Locate the median of the above data using
(i) only the less than type ogive, and
(ii) both, the less than and the greater than type ogives.
Solution: To draw ogives, we need to have a cumulative frequency distribution.
Using the less than type ogive
The value N/2= 115 is marked on the vertical axis and a horizontal line is drawn from this point to meet the ogive at point S. Drop a perpendicular from S. The point at which this meets X- axis is the median.
(ii) Using both types of ogives
A perpendicular is dropped from the point of intersection of the two ogives. The point at which it intersects the X-axis gives median. It is obvious from figures that median = 2080.
Properties of Median
1. It is a positional average.
2. It can be shown that the sum of absolute deviations is minimum when taken from median. This property implies that median is centrally located.
Merits and Demerits of Median (a) Merits
It is very easy to calculate and is readily understood.
Median is not affected by the extreme values. It is independent of the range of series.
Median can be measured graphically.
Median serves as the most appropriate average to deal with qualitative data.
Median value is always certain and specific value in the series.
Median is often used to convey the typical observation. It is primarily affected by the number of observations rather than their size.
Median does not represent the measure of such series of which different values are wide apart from each other.
Median is erratic if the number of items is small.
Median is incapable of further algebraic treatment.
Median is very much affected by the sampling fluctuations.
It is affected much more by fluctuations of sampling than A.M.
Median cannot be used for further algebraic treatment. Unlike mean we can neither find total of terms as in case of A.M. nor median of some groups when combined.
In a continuous series it has to be interpolated. We can find its true-value only if the frequencies are uniformly spread over the whole class interval in which median lies.
If the number of series is even, we can only make its estimate; as the A.M. of two middle terms is taken as Median.
It is an appropriate measure of central tendency when the characteristics are not measurable but different items are capable of being ranked.
Median is used to convey the idea of a typical observation of the given data.
Median is the most suitable measure of central tendency when the frequency distribution is skewed. For example, income distribution of the people is generally positively skewed and median is the most suitable measure of average in this case.
Median is often computed when quick estimates of average are desired.
When the given data has class intervals with open ends, median is preferred as a measure of central tendency since it is not possible to calculate mean in this case.
OTHER PARTITION OR POSITIONAL MEASURES
Median of a distribution divides it into two equal parts. It is also possible to divide it into more than two equal parts. The values that divide a distribution into more than two equal parts are commonly known as partition values or fractiles. Some important partition values are discussed in the following sections.
The values of a variable that divide a distribution into four equal parts are called quartiles. Since three values are needed to divide a distribution into four parts, there are three quartiles, viz. Q1, Q2 and Q3, known as the first, second and the third quartile respectively. For a discrete distribution, the first quartile (Q1) is defined as that value of the variate such that at least 25% of the observations are less than or equal to it and at least 75% of the observations are greater than or equal to it.
For a continuous or grouped frequency distribution, Q1 is that value of the variate such that the area under the histogram to the left of the ordinate at Q1 is 25% and the area to its right is 75%. The formula for the computation of Q1 can be written by making suitable changes in the formula of median.
After locating the first quartile class, the formula for Q1 can be written as follows:
Here, LQ1 is lower limit of the first quartile class, h is its width, fQ1 is its frequency and C is cumulative frequency of classes preceding the first quartile class.
By definition, the second quartile is median of the distribution. The third quartile (Q3) of a distribution can also be defined in a similar manner.
For a discrete distribution, Q3 is that value of the variate such that at least 75% of the observations are less than or equal to it and at least 25% of the observations are greater than or equal to it.
For a grouped frequency distribution, Q3 is that value of the variate such that area under the histogram to the left of the ordinate at Q3 is 75% and the area to its right is 25%. The formula for computation of Q3 can be written as
Deciles divide a distribution into 10 equal parts and there are, in all, 9 deciles denoted as D1, D2, ...... D9 respectively.
For a discrete distribution, the i th decile Di is that value of the variate such that at least (10i)% of the observation are less than or equal to it and at least (100 - 10i)% of the observations are greater than or equal to it (i = 1, 2, ...... 9).
For a continuous or grouped frequency distribution, Di is that value of the variate such that the area under the histogram to the left of the ordinate at Di is (10i)% and the area to its right is (100 - 10i)%. The formula for the ith decile can be written as
Percentiles divide a distribution into 100 equal parts and there are, in all, 99 percentiles denoted as P1, P2, ...... P25, ...... P40, ...... P60, ...... P99 respectively.
For a discrete distribution, the kth percentile Pk is that value of the variate such that at least k% of the observations are less than or equal to it and at least (100 - k)% of the observations are greater than or equal to it.
For a grouped frequency distribution, Pk is that value of the variate such that the area under the histogram to the left of the ordinate at Pk is k% and the area to its right is (100 - k)% . The formula for the kth percentile can be written as
(i) We may note here that P25 = Q1, P50 = D5 = Q2 = Md, P75 = Q3, P10 = D1, P20 = D2, etc.
(ii) In continuation of the above, the partition values are known as Quintiles (Octiles) if a distribution is divided in to 5 (8) equal parts.
(iii) The formulae for various partition values of a grouped frequency distribution, given so far, are based on 'less than' type cumulative frequencies. The corresponding formulae based on 'greater than' type cumulative frequencies can be written in a similar manner, as given below:
Here UQ1 ,UQ3 ,UDi ,UPK are the upper limits of the corresponding classes and C denotes the greater than type cumulative frequencies.
Example: Locate Median, Q1, Q3, D4, D7, P15, P60 and P90 from the following data:
Solution: First we calculate the cumulative frequencies, as in the following table:
Determination of Median: Here N/2= 100. From the cumulative frequency column, we note that there are 102 (greater than 50% of the total) observations that are less than or equal to 78 and there are 133 observations that are greater than or equal to 78. Therefore, Md = Rs 78.
Determination of Q1 and Q3: First we determine N/4 which is equal to 50. From the cumulative frequency column, we note that there are 67 (which is greater than 25% of the total) observations that are less than or equal to 77 and there are 165 (which is greater than 75% of the total) observations that are greater than or equal to 77. Therefore, Q1 = Rs 77. Similarly, Q3 = Rs 80.
Determination of D4 and D7: From the cumulative frequency column, we note that there are 102 (greater than 40% of the total) observations that are less than or equal to 78 and there are 133 (greater than 60% of the total) observations that are greater than or equal to 78. Therefore, D4 = Rs 78. Similarly, D7 = Rs 80.
Determination of P15, P60 and P90: From the cumulative frequency column, we note that there are 35 (greater than 15% of the total) observations that are less than or equal to 76 and there are 185 (greater than 85% of the total) observations that are greater than or equal to 76. Therefore, P15 = Rs 76. Similarly, P60 = Rs 79 and P90 = Rs 82.
Example : Calculate median, quartiles, 3rd and 6th deciles and 40th and 70th percentiles, from the following data
Also determine (i) The percentage of workers getting weekly wages between Rs 125 and Rs 260 and (ii) percentage of worker getting wages greater than Rs 340. Solution: First we make a cumulative frequency distribution table :
(i) Calculation of median: Here N = 500 so that
N/2= 250. Thus, median class is
250 - 300 and hence Lm = 250, fm = 125, h = 50 and C = 150.
Substituting these values in the formula for median, we get
Md = 250 + 250-150/125x50 = Rs 290
Hint: The given percentage of walkers and cyclists can be taken as frequencies. For calculation of mean, the necessary assumption is that the width of the first class is equal to the width of the following class, i.e., 1/4. On this assumption, the lower limit of the first class can be taken as 0. Similarly, on the assumption that width of the last class is equal to the width of last but one class, the upper limit of last class can be taken as 6. No assumption is needed for the calculation of median.
In a factory employing 3,000 persons, 5 percent earn less than Rs 3 per hour, 580 earn Rs 3.01 to 4.50 per hour, 30 percent earn from Rs 4.51 to Rs 6.00 per hour, 500 earn from 6.01 to Rs 7.5 per hour, 20 percent earn from Rs 7.51 to Rs 9.00 per hour and the rest earn Rs 9.01 or more per hour. What is the median wage? Hint: Write down the above information in the form of a frequency distribution. The class intervals given above are inclusive type. These should be converted into exclusive type for the calculation of median.
The distribution of 2,000 houses of a new locality according to their distance from a milk booth is given in the following table :
Calculate the median distance of a house from milk booth.
In the second phase of the construction of the locality, 500 additional houses were constructed out of which the distances of 200, 150 and 150 houses from the milk booth were in the intervals 450-500, 550-600 and 650-700 meters respectively. Calculate the median distance, taking all the 2500into account.
Hint: Add 200, 150 and 150 to the respective frequencies of the class intervals
450 - 500, 550 - 600 and 650 - 700.
The monthly salary distribution of 250 families in a certain locality of Agra is given below.
Draw a ‘less than’ ogive for the data given above and hence find out:
(i) The limits of the income of the middle 50% of the families. (ii) If income tax is to be levied on families whose income exceeds Rs 1800 p.m., calculate the percentage of families which will be paying income tax. Hint: See example above.
The following table gives the frequency distribution of marks of 800 candidates in an examination :
Draw 'less than' and 'more than' type ogives for the above data and answer the following from the graph:
(i) If the minimum marks required for passing are 35, what percentage of candidates pass the examination?
(ii) It is decided to allow 80% of the candidate to pass, what should be the minimum marks for passing?
(iii) Find the median of the distribution. Hint: See example above.
Following are the marks obtained by a batch of 10 students in a certain class test in statistics (X) and accountancy (Y).
In which subject the level of knowledge of student is higher? Hint: Compare median of the two series.
The mean and median marks of the students of a class are 50% and 60% respectively. Is it correct to say that majority of the students have secured more than 50% marks? Explain. Hint: It is given that at least 50% of the students have got 60% or more marks.
The monthly wages of 7 workers of a factory are : Rs 1,000, Rs 1,500, Rs 1,700, Rs 1,800, Rs 1,900, Rs 2,000 and Rs 3,000. Compute mean and median. Which measure is more appropriate? Which measure would you use to describe the situation if you were (i) a trade union leader, (ii) an employer? Hint: (i) median, (ii) mean.
A boy saves Re. 1 on the first day, Rs 2 on the second day, ...... Rs 31 on the 31st day of a particular month. Compute the mean and median of his savings per day. If his father contributes Rs 10 and Rs 100 on the 32nd and 33rd day respectively, compute mean and median of his savings per day. Comment upon the results. Hint: Mean is too much affected by extreme observations.
Mode is that value of the variate which occurs maximum number of times in a distribution and around which other items are densely distributed. In the words of Croxton and Cowden, “The mode of a distribution is the value at the point around which the items tend to be most heavily concentrated. It may be regarded the most typical of a series of values.” Further, according to A.M. Tuttle, “Mode is the value which has the greatest frequency density in its immediate neighborhood.”
If the frequency distribution is regular, then mode is determined by the value corresponding to maximum frequency. There may be a situation where concentration of observations around a value having maximum frequency is less than the concentration of observations around some other value. In such a situation, mode cannot be determined by the use of maximum frequency criterion. Further, there may be concentration of observations around more than one value of the variable and, accordingly, the distribution is said to be bimodal or multi-modal depending upon whether it is around two or more than two values.
The concept of mode, as a measure of central tendency, is preferable to mean and median when it is desired to know the most typical value, e.g., the most common size of shoes, the most common size of a ready-made garment, the most common size of income, the most common size of pocket expenditure of a college student, the most common size of a family in a locality, the most common duration of cure of viral-fever, the most popular candidate in an election, etc.
Determination of Mode
(a) When data are either in the form of individual observations or in the form of ungrouped frequency distribution
Given individual observations, these are first transformed into an ungrouped frequency distribution. The mode of an ungrouped frequency distribution can be determined in two ways, as given below:
By inspection or
By method of Grouping
(i) By inspection: When a frequency distribution is fairly regular, then mode is often determined by inspection. It is that value of the variate for which frequency is maximum. By a fairly regular frequency distribution we mean that as the values of the variable increase the corresponding frequencies of these values first increase in a gradual manner and reach a peak at certain value and, finally, start declining gradually in, approximately, the same manner as in case of increase.
Example: Compute mode of the following data:
Solution: Writing this in the form of a frequency distribution, we get
If the frequency of each possible value of the variable is same, there is no mode.
If there are two values having maximum frequency, the distribution is said to be bimodal.
Example : Compute mode of the following distribution
Solution: The given distribution is fairly regular. Therefore, the mode can be determined just by inspection. Since for X = 25 the frequency is maximum, mode = 25.
By method of Grouping: This method is used when the frequency distribution is not regular. Let us consider the following example to illustrate this method.
Example : Determine the mode of the following distribution.
Solution: This distribution is not regular because there is sudden increase in frequency from 20 to 100. Therefore, mode cannot be located by inspection and hence the method of grouping is used. Various steps involved in this method are as follows:
Prepare a table consisting of 6 columns in addition to a column for various values of X.
In the first column, write the frequencies against various values of X as given in the question.
In second column, the sum of frequencies, starting from the top and grouped in twos, are written.
In third column, the sum of frequencies, starting from the second and grouped in twos, is written.
In fourth column, the sum of frequencies, starting from the top and grouped in threes is written.
In fifth column, the sum of frequencies, starting from the second and grouped in threes is written.
In the sixth column, the sum of frequencies, starting from the third and grouped in threes is written.
The highest frequency total in each of the six columns is identified and analyzed to determine mode. We apply this method for determining mode of the above example.
Since the value 14 and 15 are both repeated maximum number of times in the analysis table, therefore, mode is ill defined. Mode in this case can be approximately located by the use of the following formula, which will be discussed later, in this chapter.
Mode = 3 Median - 2 mean
Calculation of Median and Mean
Remarks: If the most repeated values, in the above analysis table, were not adjacent, the distribution would have been bi-modal, i.e., having two modes
Example : From the following data regarding weights of 60 students of a class, find modal weight:
Solution: Since the distribution is not regular, method of grouping will be used for determination of mode.
Since the value 58 has occurred maximum number of times, therefore, mode of the distribution is 58 kgs.
(b) When data are in the form of a grouped frequency distribution
The following steps are involved in the computation of mode from a grouped frequency distribution.
(i) Determination of modal class: It is the class in which mode of the distribution lies. If the distribution is regular, the modal class can be determined by inspection, otherwise, by method of grouping.
Exact location of mode in a modal class (interpolation formula): The exact location of mode, in a modal class, will depend upon the frequencies of the classes immediately preceding and following it. If these frequencies are equal, the mode would lie at the middle of the modal class interval.
However, the position of mode would be to the left or to the right of the middle point depending upon whether the frequency of preceding class is greater or less than the frequency of the class following it. The exact location of mode can be done by the use of interpolation formula, developed below:
Let the modal class be denoted by Lm - Um, where Lm and Um denote its lower and the upper limits respectively. Further, let fm be its frequency and h its width. Also let f1 and f2 be the respective frequencies of the immediately preceding and following classes.
We assume that the width of all the class intervals of the distribution are equal. If these are not equal, make them so by regrouping under the assumption that frequencies in a class are uniformly distributed.
Make a histogram of the frequency distribution with height of each rectangle equal to the frequency of the corresponding class. Only three rectangles, out of the complete histogram, that are necessary for the purpose are shown in the above figure.
Note: The above formulae are applicable only to a unimodal frequency distribution.
Example : The monthly profits (in Rs) of 100 shops are distributed as follows:
Determine the 'modal value' of the distribution graphically and verify the result by calculation.
Solution: Since the distribution is regular, the modal class would be a class having the highest frequency. The modal class, of the given distribution, is 200 - 300.
Graphical Location of Mode
To locate mode we draw a histogram of the given frequency distribution. The mode is located as shown in figure. From the figure, mode = Rs 256.
Determination of Mode by interpolation formula
Since the modal class is 200 - 300, Lm = 200, D1 = 27 - 18 = 9, D2 = 27 - 20 = 7 and h = 100.
Since the two classes, 120 - 130 and 130 - 140, are repeated maximum number of times in the above table, it is not possible to locate modal class even by the method of grouping. However, an approximate value of mode is given by the empirical formula:
Mode = 3 Median - 2 Mean (See § 2.9)
Looking at the cumulative frequency column, given in the question, the median class is
130 - 140. Thus, Lm = 130, C = 46, fm = 21, h = 10. ∴ Md = 130 + 50-46/21 x 10 = 131.9 lbs.
Assuming that the width of the first class is equal to the width of second, we can write
Remarks: Another situation, in which we can use the empirical formula, rather than the interpolation formula, is when there is maximum frequency either in the first or in the last class.
Calculation of Mode when either D1 or D2 is negative:
The interpolation formula, for the calculation of mode, is applicable only if both D1 and D2 are positive. If either D1 or D2 is negative, we use an alternative formula that gives only an approximate value of the mode.
We recall that the position of mode, in a modal class, depends upon the frequencies of its preceding and following classes, denoted by f1 and f2 respectively. If f1 = f2, the mode will be at the middle point which can be obtained by adding f2/(f1+f2)*h to the lower limit of the modal class or, equivalently, it can be obtained by subtracting f2/(f1+f2)*h from its upper limit. We may note that f1/(f1+f2)=f2/(f1+f2)= 1/2 when f1 = f2.
Further, if f2 > f1, the mode will lie to the right of the mid-value of modal class and, therefore, the ratio f2/f1 f2 will be greater than 1/2 . Similarly, if f2 < f1, the mode will lie to the left of the mid-value of modal class and, therefore, the ratio f2/f1 f2 will be less than ½ . Thus, we can write an alternative formula for mode as:
Remarks: The above formula gives only an approximate estimate of mode vis-a-vis the interpolation formula.
Example : Calculate mode of the following distribution.
Solution: The mid-values with equal gaps are given, therefore, the corresponding class intervals would be 0 - 10, 10 - 20, 20 - 30, etc.
Since the given frequency distribution is not regular, the modal class will be determined by the method of grouping.
Example : The rate of sales tax as a percentage of sales, paid by 400 shopkeepers of a market during an assessment year ranged from 0 to 25%. The sales tax paid by 18% of them was not greater than 5%. The median rate of sales tax was 10% and 75th percentile rate of sales tax was 15%. If only 8% of the shopkeepers paid sales tax at a rate greater than 20% but not greater than 25%, summarize the information in the form of a frequency distribution taking intervals of 5%. Also find the modal rate of sales tax.
Solution: The above information can be written in the form of the following distribution:
Example : The following table gives the incomplete income distribution of 300 workers of a firm, where the frequencies of the classes 3000 - 4000 and 5000 - 6000 are missing. If the mode of the distribution is Rs 4428.57, find the missing frequencies.
Merits and Demerits of Mode Merits
Mode is very simple measure of central tendency. Sometimes, just at the series is enough to locate the model value. Because of its simplicity, it s a very popular measure of the central tendency.
Compared top mean, mode is less affected by marginal values in the series. Mode is determined only by the value with highest frequencies.
Mode can be located graphically, with the help of histogram.
Mode is that value which occurs most frequently in the series. Accordingly, mode is the best representative value of the series.
The calculation of mode does not require knowledge of all the items and frequencies of a distribution. In simple series, it is enough if one knows the items with highest frequencies in the distribution.
Mode is an uncertain and vague measure of the central tendency.
Unlike mean, mode is not capable of further algebraic treatment.
With frequencies of all items are identical, it is difficult to identify the modal value.
Calculation of mode involves cumbersome procedure of grouping the data. If the extent of grouping changes there will be a change in the model value.
It ignores extreme marginal frequencies. To that extent model value is not a representative value of all the items in a series.
RELATION BETWEEN MEAN, MEDIAN AND MODE
The relationship between the above measures of central tendency will be interpreted in terms of a continuous frequency curve. If the number of observations of a frequency distribution is increased gradually, then accordingly, we need to have more number of classes, for approximately the same range of values of the variable, and simultaneously, the width of the corresponding classes would decrease. Consequently, the histogram of the frequency distribution will get transformed into a smooth frequency curve, as shown in the following figure.
For a given distribution, the mean is the value of the variable which is the point of balance or centre of gravity of the distribution. The median is the value such that half of the observations are below it and remaining half are above it. In terms of the frequency curve, the total area under the curve is divided into two equal parts by the ordinate at median. Mode of a distribution is a value around which there is maximum concentration of observations and is given by the point at which peak of the curve occurs. For a symmetrical distribution, all the three measures of central tendency are equal i.e. X = Md = Mo, as shown in the following figure.
Imagine a situation in which the symmetrical distribution is made asymmetrical or positively (or negatively) skewed by adding some observations of very high (or very low) magnitudes, so that the right hand (or the left hand) tail of the frequency curve gets elongated.
Consequently, the three measures will depart from each other. Since mean takes into account the magnitudes of observations, it would be highly affected. Further, since the total number of observations will also increase, the median would also be affected but to a lesser extent than mean. Finally, there would be no change in the position of mode. More specifically, we shall have Mo < Md < X , when skewness is positive and X < Md < Mo, when skewness is negative, as shown in the following figure.
Empirical Relation between Mean, Median and Mode
Empirically, it has been observed that for a moderately skewed distribution, the difference between mean and mode is approximately three times the difference between mean and median, i.e.,
This relation can be used to estimate the value of one of the measures when the values of the other two are known.
The mean and median of a moderately skewed distribution are 42.2 and 41.9 respectively. Find mode of the distribution.
For a moderately skewed distribution, the median price of men's shoes is Rs 380 and modal price is Rs 350. Calculate mean price of shoes.
(a) Here, mode will be determined by the use of empirical formula.
Choice of a Suitable Average
The choice of a suitable average, for a given set of data, depends upon a number of considerations which can be classified into the following broad categories:
Considerations based on the suitability of the data for an average.
Considerations based on the purpose of investigation.
Considerations based on various merits of an average.
(a) Considerations based on the suitability of the data for an average:
The nature of the given data may itself indicate the type of average that could be selected. For example, the calculation of mean or median is not possible if the characteristic is neither measurable nor can be arranged in certain order of its intensity. However, it is possible to calculate mode in such cases. Suppose that the distribution of votes polled by five candidates of a particular constituency are given as below:
Since the above characteristic, i.e., name of the candidate, is neither measurable nor can be arranged in the order of its intensity, it is not possible to calculate the mean and median. However, the mode of the distribution is D and hence, it can be taken as the representative of the above distribution.
If the characteristic is not measurable but various items of the distribution can be arranged in order of intensity of the characteristics, it is possible to locate median in addition to mode. For example, students of a class can be classified into four categories as poor, intelligent, very intelligent and most intelligent. Here the characteristic, intelligence, is not measurable. However, the data can be arranged in ascending or descending order of intelligence. It is not possible to calculate mean in this case.
If the characteristic is measurable but class intervals are open at one or both ends of the distribution, it is possible to calculate median and mode but not a satisfactory value of mean. However, an approximate value of mean can also be computed by making certain an assumption about the width of class (es) having open ends.
If the distribution is skewed, the median may represent the data more appropriately than mean and mode.
If various class intervals are of unequal width, mean and median can be satisfactorily calculated. However, an approximate value of mode can be calculated by making class intervals of equal width under the assumption that observations in a class are uniformly distributed. The accuracy of the computed mode will depend upon the validity of this assumption.
(b) Considerations based on the purpose of investigation:
The choice of an appropriate measure of central tendency also depends upon the purpose of investigation. If the collected data are the figures of income of the people of a particular region and our purpose is to estimate the average income of the people of that region, computation of mean will be most appropriate. On the other hand, if it is desired to study the pattern of income distribution, the computation of median, quartiles or percentiles, etc., might be more appropriate. For example, the median will give a figure such that 50% of the people have income less than or equal to it.
Similarly, by calculating quartiles or percentiles, it is possible to know the percentage of people having at least a given level of income or the percentage of people having income between any two limits, etc.
If the purpose of investigation is to determine the most common or modal size of the distribution, mode is to be computed, e.g., modal family size, modal size of garments, modal size of shoes, etc. The computation of mean and median will provide no useful interpretation of the above situations.
(c) Considerations based on various merits of an average: The presence or absence of various characteristics of an average may also affect its selection in a given situation.
If the requirement is that an average should be rigidly defined, mean or median can be chosen in preference to mode because mode is not rigidly defined in all the situations.
An average should be easy to understand and easy to interpret. This characteristic is satisfied by all the three averages.
It should be easy to compute. We know that all the three averages are easy to compute. It is to be noted here that, for the location of median, the data must be arranged in order of magnitude. Similarly, for the location of mode, the data should be converted into a frequency distribution. This type of exercise is not necessary for the computation of mean.
It should be based on all the observations. This characteristic is met only by mean and not by median or mode.
It should be least affected by the fluctuations of sampling. If a number of independent random samples of same size are taken from a population, the variations among means of these samples are less than the variations among their medians or modes. These variations are often termed as sampling variations.
Therefore, preference should be given to mean when the requirement of least sampling variations is to be fulfilled. It should be noted here that if the population is highly skewed, the sampling variations in mean may be larger than the sampling variations in median.
It should not be unduly affected by the extreme observations. The mode is most suitable average from this point of view. Median is only slightly affected while mean is very much affected by the presence of extreme observations.
It should be capable of further mathematical treatment. This characteristic is satisfied only by mean and, consequently, most of the statistical theories use mean as a measure of central tendency.
It should not be affected by the method of grouping of observations. Very often the data are summarized by grouping observations into class intervals. The chosen average should not be much affected by the changes in size of class intervals.
It can be shown that if the same data are grouped in various ways by taking class intervals of different size, the effect of grouping on mean and median will be very small particularly when the number of observations is very large. Mode is very sensitive to the method of grouping.
It should represent the central tendency of the data. The main purpose of computing an average is to represent the central tendency of the given distribution and, therefore, it is desirable that it should fall in the middle of distribution. Both mean and median satisfy this requirement but in certain cases mode may be at (or near) either end of the distribution.
The geometric mean of a series of n positive observations is defined as the nth root of their product.
Calculation of Geometric Mean
(a) Individual series
If there are n observations, X1, X2, ...... Xn, such that Xi > 0 for each i, their geometric mean (GM) is defined as
Average Rate of Growth of Population
The average rate of growth of price, denoted by r in the above section, can also be interpreted as the average rate of growth of population. If P0 denotes the population in the beginning of the period and Pn the population after n years, using Equation (2), we can write the expression for the average rate of change of population per annum as
Similarly, Equation (4), given above, can be used to find the average rate of growth of population when its rates of growth in various years are given.
Remarks: The formulae of price and population changes, considered above, can also be extended to various other situations like growth of money, capital, output, etc.
Example : The population of a country increased from 2,00,000 to 2,40,000 within a period of 10 years. Find the average rate of growth of population per year.
Solution: Let r be the average rate of growth of population per year for the period of 10 years. Let P0 be initial and P10 be the final population for this period. We are given P0 = 2,00,000 and P10 = 2,40,000.
Thus, r = 1.018 - 1 = 0.018.
Hence, the percentage rate of growth = 0.018 ××100 = 1.8% p. a.
Example : The gross national product of a country was Rs 20,000 crores before 5 years. If it is Rs 30,000 crores now, find the annual rate of growth of G.N.P.
Solution: Here P5 = 30,000, P0 = 20,000 and n = 5.
Hence r = 1.084 - 1 = 0.084
Thus, the percentage rate of growth of G.N.P. is 8.4% p.a
Example : Find the average rate of increase of population per decade, which increased by 20% in first, 30% in second and 40% in the third decade.
Solution: Let r denote the average rate of growth of population per decade, then
Hence, the percentage rate of growth of population per decade is 29.7%.
Suitability of Geometric Mean for Averaging Ratios
It will be shown here that the geometric mean is more appropriate than arithmetic mean while averaging ratios. Let there be two values of each of the variables x and y, as given below:
We note that their product is not equal to unity. However, the product of their respective geometric means, i.e., 1/√6 and √6 , is equal to unity. Since it is desirable that a method of average should be independent of the way in which a ratio is expressed, it seems reasonable to regard geometric mean as more appropriate than arithmetic mean while averaging ratios.
Properties of Geometric Mean
As in case of arithmetic mean, the sum of deviations of logarithms of values from the log GM is equal to zero.
This property implies that the product of the ratios of GM to each observation, that is less than it, is equal to the product the ratios of each observation to GM that is greater than it. For example, if the observations are 5, 25, 125 and 625, their GM = 55.9. The above property implies that 55.9/5 x 55.9/25 = 125 /55.9 x 625/55.9
Similar to the arithmetic mean, where the sum of observations remains unaltered if each observation is replaced by their AM, the product of observations remains unaltered if each observation is replaced by their GM.
Merits, Demerits and Uses of Geometric Mean
It is based on all the items of the data..
It is rigidly defined. It means different investigators will find the same result from the given set of data.
It is a relative measure and given less importance to large items and more to small ones unlike the arithmetic mean.
Geometric mean is useful in ratios and percentages and in determining rates of increase or decrease.
It is capable of algebraic treatment. It mean we can find out the combined geometric mean of two or more series.
It is not easily understood and therefore is not widely used.
It is difficult to compute as it involves the knowledge of ratios, roots, logs and antilog.
It becomes indeterminate in case any value in the given series happens to be zero or negative.
With open-end class intervals of the data, geometric mean cannot be calculated.
Geometric mean may not correspond to any value of the given data.
It is most suitable for averaging ratios and exponential rates of changes.
It is used in the construction of index numbers.
It is often used to study certain social or economic phenomena.
Exercise with Hints
A sum of money was invested for 4 years. The respective rates of interest per annum were 4%, 5%, 6% and 8%. Determine the average rate of interest p.a.
The number of bacteria in a certain culture was found to be 4 ´ 106 at noon of one day. At noon of the next day, the number was 9 ´ 106. If the number increased at a constant rate per hour, how many bacteria were there at the intervening midnight?
Hint: The number of bacteria at midnight is GM of 4 ´ 106 and 9 ´ 106.
If the price of a commodity doubles in a period of 4 years, what is the average percentage increase per year?
A machine is assumed to depreciate by 40% in value in the first year, by 25% in second year and by 10% p.a. for the next three years, each percentage being calculated on the diminishing value. Find the percentage depreciation p.a. for the entire period.
A certain store made profits of Rs 5,000, Rs 10,000 and Rs 80,000 in 1965, 1966 and 1967 respectively. Determine the average rate of growth of its profits.
An economy grows at the rate of 2% in the first year, 2.5% in the second, 3% in the third, 4% in the fourth ...... and 10% in the tenth year. What is the average rate of growth of the economy?
The export of a commodity increased by 30% in 1988, decreased by 22% in 1989 and then increased by 45% in the following year. The increase/decrease, in each year, being measured in comparison to its previous year. Calculate the average rate of change of the exports per annum.
Show that the arithmetic mean of two positive numbers a and b is at least as large as their geometric mean.
Hint: We know that the square of the difference of two numbers is always positive, i.e., (a - b)2 ³0. Make adjustments to get the inequality (a + b)2³4ab and then get the desired result, i.e., AM ³ GM.
If population has doubled itself in 20 years, is it correct to say that the rate of growth has been 5% per annum?
The weighted geometric mean of 5 numbers 10, 15, 25, 12 and 20 is 17.15. If the weights of the first four numbers are 2, 3, 5, and 2 respectively, find weight of the fifth number. Hint: Let x be the weight of the 5th number, then
The harmonic mean of n observations, none of which is zero, is defined as the reciprocal of the arithmetic mean of their reciprocals.
Calculation of Harmonic Mean
(a) Individual series
If there are n observations X1, X2, ...... Xn, their harmonic mean is defined as
Example : A train travels 50 kms at a speed of 40 kms/hour, 60 kms at a speed of 50 kms/hour and 40 kms at a speed of 60 kms/hour. Calculate the weighted harmonic mean of the speed of the train taking distances travelled as weights. Verify that this harmonic mean represents an appropriate average of the speed of train.
Verification : Average speed = Total distance travelled/Total time taken We note that the numerator of Equation (1) gives the total distance travelled by train. Further, its denominator represents total time taken by the train in travelling 150 kms, Since 50/40 is time taken by the train in travelling 50 kms at a speed of 40 kms/hour.
Similarly 60/50 and 40/60 are time taken by the train in travelling 60 kms and 40 kms at the speeds of 50 kms./hour and 60 kms/hour respectively. Hence, weighted harmonic mean is most appropriate average in this case.
Example : Ram goes from his house to office on a cycle at a speed of 12 kms/hour and returns at a speed of 14 kms/hour. Find his average speed. Solution: Since the distances of travel at various speeds are equal, the average speed of Ram will be given by the simple harmonic mean of the given speeds.
Choice between Harmonic Mean and Arithmetic Mean
The harmonic mean, like arithmetic mean, is also used in averaging of rates like price per unit, kms per hour, work done per hour, etc., under certain conditions. To explain the method of choosing an appropriate average, consider the following illustration.
Let the price of a commodity be Rs 3, 4 and 5 per unit in three successive years. If we take A.M. of these prices, i.e., 3+4+5/3 = 4, then it will denote average price when equal quantities of the commodity are purchased in each year. To verify this, let us assume that 10 units of commodity are purchased in each year.
Total expenditure on the commodity in 3 years = 10*3 + 10*4 + 10*5.
which is arithmetic mean of the prices in three years.
Further, if we take harmonic mean of the given prices, i.e.
it will denote the average price when equal amounts of money are spent on the commodity in three years. To verify this let us assume that Rs 100 is spent in each year on the purchase of the commodity.
Next, we consider a situation where different quantities are purchased in the three years. Let us assume that 10, 15 and 20 units of the commodity are purchased at prices of Rs 3, 4 and 5 respectively.
which is weighted arithmetic mean of the prices taking respective quantities as weights. Further, if Rs 150, 200 and 250 are spent on the purchase of the commodity at prices of Rs 3, 4 and 5 respectively, then
purchased in respective situations. The above average price is equal to the weighted harmonic mean of prices taking money spent as weights.
Therefore, to decide about the type of average to be used in a given situation, the first step is to examine the rate to be averaged. It may be noted here that a rate represents a ratio, e.g., price = money/quantity, speed = distance/time , work done per hour = work done/time taken , etc.
We have seen above that arithmetic mean is appropriate average of prices (Money/quantity) when quantities, which appear in the denominator of the rate to be averaged, purchased in different situations are given. Similarly, harmonic mean will be appropriate when sums of money, that appear in the numerator of the rate to be averaged, spent in different situations are given.
To conclude, we can say that the average of a rate, defined by the ratio p/q, is given by the arithmetic mean of its values in different situations if the conditions are given in terms of q and by the harmonic mean if the conditions are given in terms of p. Further, if the conditions are same in different situations, use simple AM or HM and otherwise use weighted AM or HM.
Example : An individual purchases three qualities of pencils. The relevant data are given below:
Example : In a 400 metre athlete competition, a participant covers the distance as given below. Find his average speed.
Example : Peter travelled by a car for four days. He drove 10 hours each day. He drove first day at the rate of 45 kms/hour, second day at the rate of 40 kms/hour, third day at the rate of 38 kms/hour and fourth day at the rate of 37 kms/hour. What was his average speed.
Solution: Since the rate to be averaged is speed= (Distance/time) and the conditions are given in terms of time, therefore AM will be appropriate. Further, since Peter travelled for equal number of hours on each of the four days, simple AM will be calculated.
∴ Average speed = 45+40+38+37/4 = 40 kms/hour
Example : In a certain factory, a unit of work is completed by A in 4 minutes, by B in 5 minutes, by C in 6 minutes, by D in 10 minutes and by E in 12 minutes. What is their average rate of working? What is the average number of units of work completed per minute? At this rate, how many units of work each of them, on the average, will complete in a six hour day? Also find the total units of work completed.
Solution: Here the rate to be averaged is time taken to complete a unit of work, i.e., time/units of work done . Since we have to determine the average with reference to a (six hours) day, therefore, HM of the rates will give us appropriate average.
Thus, the average rate of working =
The average number of units of work completed per minute = 1/6.25 = 0.16.
The average number of units of work completed by each person = 0.16 *360 = 57.6.
Total units of work completed by all the five persons = 57.6 * 5 = 288.0.
Example : A scooterist purchased petrol at the rate of Rs 14, 15.50 and 16 per litre during three successive years. Calculate the average price of petrol (i) if he purchased 150, 160 and 170 litres of petrol in the respective years and (ii) if he spent Rs 2,200, 2,500 and 2,600 in the three years.
Solution: The rate to be averaged is expressed as Money/litre
(i) Since the condition is given in terms of different litres of petrol in three years, therefore, weighted AM will be appropriate
Merits and Demerits of Harmonic Mean
It is rigidly defined average and its value is always definite.
Its value is based on all observation in a given series.
It is capable of further algebraic treatment.
It is not affected by sampling fluctuations.
In problems relating to time and rates, it gives better results as compared to other averages. Harmonic mean gives the best result when distance covered are the same, but speed of coverage varies.
It is not easily understood and hence its application is ignored.
It is not easy to calculate as it involves reciprocal values (The use of calculators can help to remove this difficulty).
It gives undue weights to small items and ignores bigger items. This restricts its use in the analysis of economic data.
In case of zero or negative values, it cannot be computed.
Relationship among AM, GM and HM
If all the observations of a variable are same, all the three measures of central tendency coincide, i.e., AM = GM = HM. Otherwise, we have AM > GM > HM.
Example : Show that for any two positive numbers a and b, AM ³ GM ³ HM.
Solution: The three averages of a and b are:
Exercise with Hints
A train runs 25 miles at a speed of 30 m.p.h., another 50 miles at a speed of 40 m.p.h., then due to repairs of the track, 6 miles at a speed of 10 m.p.h. What should be the speed of the train to cover additional distance of 24 miles so that the average speed of the whole run of 105 miles is 35 m.p.h?
Hint: Let x be the speed to cover a distance of 24 miles,
Prices per share of a company during first five days of a month were Rs 100, 120, 150, 140 and 50.
Find the average daily price per share.
Find the average price paid by an investor who purchased Rs 20,000 worth of shares on each day.
Find the average price paid by an investor who purchased 100, 110, 120, 130 and 150 shares on respective days.
Hint: Find simple HM in (ii) and weighted AM in (iii).
Typist A can type a letter in five minutes, B in ten minutes and C in fifteen minutes. What is the average number of letters typed per hour per typist?
Hint: Since we are given conditions in terms of per hour, therefore, simple HM of speed will give the average time taken to type one letter. From this we can obtain the average number of letters typed in one hour by each typist.
Ram paid Rs 15 for two dozens of bananas in one shop, another Rs 15 for three dozens of bananas in second shop and Rs 15 for four dozens of bananas in third shop. Find the average price per dozen paid by him.
Hint: First find the prices per dozen in three situations and since equal money is spent,
HM is the appropriate average.
A country accumulates Rs 100 crores of capital stock at the rate of Rs 10 crores/year, another Rs 100 crores at the rate of Rs 20 crores/year and Rs 100 crores at the rate of Rs 25 crores/year. What is the average rate of accumulation?
Hint: Since Rs 100 crores, each, is accumulated at the rates of Rs 10, 20 and 25 crores/year, simple HM of these rates would be most appropriate.
A motor car covered a distance of 50 miles 4 times. The first time at 50 m.p.h., the second at 20 m.p.h., the third at 40 m.p.h. and the fourth at 25 m.p.h. Calculate the average speed. Hint: Use HM.
The interest paid on each of the three different sums of money yielding 10%, 12% and 15% simple interest p.a. is the same. What is the average yield percent on the sum invested? Hint: Use HM
Quadratic mean is the square root of the arithmetic mean of squares of observations. If X1, X2 ...... Xn are n observations, their quadratic mean is given by
This is a special type of average used to eliminate periodic fluctuations from the time series data.
A progressive average is a cumulative average which is computed by taking all the available figures in each succeeding years. The average for different periods is obtained as shown below:
This average is often used in the early years of a business.
Thus we can say that Mean, Median, Mode is the essential phenomena in any statistical analysis. Thus the measure central tendency helps in summarising the data and classify it into simple form.