Business Statistics Notes

STATISTICS

“A knowledge of statistics is like a knowledge of foreign language of algebra; it may prove of use at any time under any circumstance”……………………………………...Bowley.

A.L. Bowley has defined statistics as: (i) statistics is the science of counting, (ii) Statistics may rightly be called the science of averages, and (iii) statistics is the science of measurement of social organism regarded as a whole in all its manifestations.

Boddington defined as: Statistics is the science of estimates and probabilities.

Seligman explored that statistics is a science that deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to throw  some light on any sphere of enquiry.

Spiegal defines statistics highlighting its role in decision-making particularly under uncertainty.

CHARACTERSTICS OF STATISTICS

·         Statistics are the aggregates of facts.

·         Statistics are affected by a number of factors.

·         Statistics must be reasonably accurate.

·         Statistics must be collected in a systematic manner.

·         Collected in a systematic manner for a pre-determined purpose.

 

TYPES OF DATA AND DATA SOURCES

Statistical data are the basic raw material of statistics.

Any object subject phenomenon, or activity that generates data through this process is termed as a variable. In other words, a variable is one that shows a degree of variability when successive measurements are recorded.

In statistics, data are classified into two broad categories: quantitative data and qualitative data.

Quantitative data are those that can be quantified in definite units of measurement.  These refer to characteristics whose successive measurements yield quantifiable observations. Depending on the nature of the variable observed for measurement, quantitative data can be further categorized as continuous and discrete data. Obviously, a variable may be a continuous variable or a discrete variable.

Continuous Data   A  continuous variable is the one that can assume any value between any two points on a line segment, thus representing an interval of values. Thus, the data recorded on these and similar other characteristics are called continuous data. It may be noted that a continuous variable assumes the finest unit of measurement. Finest in the sense that it enables measurements to the maximum degree of precision.

Discrete data  A discrete variable is the one whose outcomes are measured in fixed numbers. Such data  are essentially count data.

Qualitative data  A  characteristic is qualitative in nature when its observations are defined and noted in terms of the presence or absence of a certain attribute in discrete numbers. These data are further classified as nominal and rank data.

Nominal data are the outcome of classification into two or more categories of  items or units comprising a sample or a population according to some quality  characteristic. Classification of students according to sex (as males and females), of workers according to skill (as skilled, semi-skilled, and unskilled), and of employees according to the level of education (as matriculates, undergraduates, and post-graduates), all result into nominal data. Given any  such basis of classification, it is always possible to assign each item to a particular class and make a summation of items belonging to each class. The count data so obtained are called nominal data.

Rank data, on the other hand, are the result of assigning ranks to specify order in terms of the integers 1,2,3, ..., n. Ranks may be assigned according to the level of performance in a test.

Data sources could be seen as of two types, viz., secondary and primary. The two can

be defined as under:

(i) Secondary data: They already exist in some form: published or unpublished - in an identifiable secondary source. They are, generally, available from published source(s), though not necessarily in the form actually required.

(ii) Primary data: Those data which do not already exist in any form, and thus  have to be collected for the first time from the primary source(s). By their very  nature, these data require fresh and first-time collection covering the whole population or a sample drawn from it.

TYPES OF STATISTICS

Descriptive statistics deals with collecting, summarizing, and simplifying data, which are otherwise quite unwieldy and voluminous. It seeks to achieve this in a manner that meaningful conclusions can be readily drawn from the data. Descriptive statistics may thus be seen as comprising methods of bringing out and highlighting the latent characteristics present in a set of numerical data.

Inferential statistics, also known as inductive statistics, goes beyond describing a given problem situation by means of collecting, summarizing, and meaningfully presenting the related data. Instead, it consists of methods that are used for drawing inferences, or making broad generalizations, about a totality of observations on the basis of knowledge about a part of that totality.

Inferential statistics helps to evaluate the risks involved in reaching inferences or generalizations about an unknown population on the basis of sample information.

Limitations of statistics

(i)                 Sources of data not given

(ii)               (ii) Defective data

(iii)             Unrepresentative sample

(iv)             Inadequate sample

(v)               Unfair Comparisons

(vi)             Unwanted conclusions

(vii)           Confusion of correlation and causation

(viii)          

CENTRAL TENDENCY

ARITHMETIC MEAN

Adding all the observations and dividing the sum by the number of observations results the arithmetic mean.

For grouped data, arithmetic mean may be calculated by applying any of the following methods:

(i)                 Direct method, (ii) Short-cut method , (iii) Step-deviation method.

It may be noted that the mid-point of each class is taken as a good approximation of the true mean of the class.

CHARACTERISTICS OF THE ARITHMETIC MEAN

1.      The sum of the deviations of the individual items from the arithmetic mean is always zero.

2.      The sum of the squared deviations of the individual items from the arithmetic mean is always minimum.

3.      As the arithmetic mean is based on all the items in a series, a change in the value of any item will lead to a change in the value of the arithmetic mean.

4.      In the case of highly skewed distribution, the arithmetic mean may get distorted on account of a few items with extreme values.

 

MEDIAN

 

Median is defined as the value of the middle item (or the mean of the values of the two middle items) when the data are arranged in an ascending or descending order of magnitude. Thus, in an ungrouped frequency distribution if the n values are arranged in ascending or descending order of magnitude, the median is the middle value if n is odd. When n is even, the median is the mean of the two middle values.

 

To understand these, we should first know that the median belongs to a general class of statistical descriptions called fractiles. A fractile is a value below that lays a given fraction of a set of data. In the case of the median, this fraction is one-half (1/2).

 

Deciles (where the series is divided into 10 parts) and percentiles (where the series is divided into 100 parts).

 

CHARACTERISTICS OF THE MEDIAN

1.      Unlike the arithmetic mean, the median can be computed from open-ended distributions. This is because it is located in the median class-interval, which would not be an open-ended class.

2. The median can also be determined graphically whereas the arithmetic mean cannot be ascertained in this manner.

3. As it is not influenced by the extreme values, it is preferred in case of a distribution having extreme values.

4. In case of the qualitative data where the items are not counted or measured but are scored or ranked, it is the most appropriate measure of central tendency.

 

MODE

The mode is another measure of central tendency. It is the value at the point around which the items are most heavily concentrated.

Mode = 3 median - 2 mean

And it can give only approximate results. As such, its frequent use should be avoided. However,

when mode is ill defined or the series is bimodal (as is the case in the present example) it may be used.

 

RELATIONSHIPS OF THE MEAN, MEDIAN AND MODE

 

(i)                 When a distribution is symmetrical, the mean, median and mode are the same.

(ii)               In case, a distribution is skewed to the right, then  mean> median> mode. Generally, income distribution is skewed to the right where a large number of families have relatively low income and a small number of families have extremely high income.

(iii)             When a distribution is skewed to  the left, then mode> median>  mean. This is because here mean is  pulled down  below the median   by extremely low values.

(iv)             Given the mean and median of a unimodal distribution, we can determine whether it is skewed to the right or left. When mean> median, it is skewed to the right; when median> mean, it is skewed to the left. It may be noted that the median is always in the middle between mean and mode.

BEST MEASURE OF CENTRAL TENDENCY

The arithmetic mean is the sum of the values divided by the total number of observations in the series.

The median is the value of the middle observation that divides the series into two equal parts.

Mode is the value around which the observations tend to concentrate.

GEOMETRIC MEAN

The geometric mean is more important than the harmonic mean. Geometric mean is defined at the nth root of the product of n observations of a distribution.

Similarly, if there are three observations, then we have to calculate the cube root of the product of these three observations; and so on.

When the number of items is large, it becomes extremely difficult to multiply the numbers and to calculate the root. To simplify calculations, logarithms are used.

The geometric mean is most suitable in the following three cases:

1. Averaging rates of change.

2. The compound interest formula.

3. Discounting, capitalization

 

This process of ascertaining the present value of future income by using the interest rate is known as discounting.

ADVANTAGES OF G. M. 

1. Geometric mean is based on each and every observation in the data set.

2. It is rigidly defined.

3. It is more suitable while averaging ratios and percentages as also in calculating growth rates. 

4. As compared to the arithmetic mean, it gives more weight to small values and less weight to large values. As a result of this characteristic of the geometric mean, it is generally less than the arithmetic mean. At times it may be equal to the arithmetic mean.

5. It is capable of algebraic manipulation. If the geometric mean has two or more series is known along with their respective frequencies. Then a combined geometric mean can be calculated by using the logarithms.

LIMITATIONS OF G.M.

1. As compared to the arithmetic mean, geometric mean is difficult to understand.

2. Both computation of the geometric mean and its interpretation are rather difficult.

3. When there is a negative item in a series or one or more observations have zero value, then the geometric mean cannot be calculated.

In view of the limitations mentioned above, the geometric mean is not frequently used.

 

HARMONIC MEAN

The harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of individual observations. Symbolically,

The calculation of harmonic mean becomes very tedious when a distribution has a large number of observations.

The main advantage of the harmonic mean is that it is based on all observations in a distribution and is amenable to further algebraic treatment. When we desire to give greater weight to smaller observations and less weight to the larger observations, then the use of harmonic mean will be more suitable

 

Limitations of the harmonic mean.

First, it is difficult to understand as well as difficult to compute.

Second, it cannot be calculated if any of the observations is zero or negative.

Third, it is only a summary figure, which may not be an actual observation in the distribution.

It is worth noting that the harmonic mean is always lower than the geometric mean, which is lower than the arithmetic mean. This is because the harmonic mean assigns lesser importance to higher values. Since the harmonic mean is based on reciprocals, it becomes clear that as reciprocals of higher values are lower than those of lower values, it is a lower average than the arithmetic mean as well as the geometric mean.

QUADRATIC MEAN

Geometric mean is the antilogarithm of the arithmetic mean of the logarithms, and the harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. Likewise, the quadratic mean (Q) is the square root of the arithmetic mean of the squares.

the quadratic mean can be used while averaging deviations when the standard deviation is to be calculated.

Q>x>G>H provided that all the individual observations in a series are positive and    all of them are not the same.

 

DISPERSION AND SKEWNESS

The dispersion or variability provides us one more step in increasing our understanding of the pattern of the data. Further, a high degree of uniformity (i.e. low degree of dispersion) is a desirable quality.

1. "Dispersion is the measure of the variation of the items." -A.L. Bowley

2. "The degree to which numerical data tend to spread about an average value is called the variation of dispersion of the data."  -Spiegel

 3. Dispersion or spread is the degree of the scatter or variation of the variable about a central value."      -Brooks & Dick

4. "The measurement of the scatterness of the mass of figures in a series about an average is called measure of variation or dispersion." -Simpson & Kajka

Since measures of dispersion give an average of the differences of various items from an average, they are also called averages of the second order. An average is more meaningful when it is examined in the light of dispersion.

 

SIGNIFICANCE AND PROPERTIES OF MEASURING VARIATION

1. Measures of variation point out as to how far an average is representative of the mass. When dispersion is small, the average is a typical value in the sense that it closely represents the individual value and it is reliable in the sense that it is a good estimate of the average in the corresponding universe. On the other hand, when dispersion is large, the average is not so typical, and unless the sample is very large, the average may be quite unreliable.

2. Another purpose of measuring dispersion is to determine nature and cause of variation in order to control the variation itself. In matters of health variations in body temperature, pulse beat and blood pressure are the basic guides to diagnosis. Prescribed treatment is designed to control their variation. In industrial production efficient operation requires control of quality variation the causes of which are sought through inspection is basic to the control of causes of variation. In social sciences a special problem requiring the measurement of variability is the measurement of "inequality" of the distribution of income or wealth etc.

3. Measures of dispersion enable a comparison to be made of two or more series with regard to their variability. The study of variation may also be looked upon as a means of determining uniformity of consistency. A high degree of variation would mean little uniformity or consistency whereas a low degree of variation would mean great uniformity or consistency.

4. Many powerful analytical tools in statistics such as correlation analysis. the testing of hypothesis, analysis of variance, the statistical quality control, regression analysis is based on measures of variation of one kind or another.

MEAURES OF DISPERSION

There are five measures of dispersion: Range, Inter-quartile range or Quartile Deviation, Mean deviation, Standard Deviation, and Lorenz curve. Among them, the first four are mathematical methods and the last one is the graphical method.

RANGE

The simplest measure of dispersion is the range, which is the difference between the maximum value and the minimum value of data.

When the sample size is very small, the range is considered quite adequate measure of the variability. Thus, it is widely used in quality control where a continuous check on the variability of raw materials or finished products is needed. The range is also a suitable measure in weather forecast.

Limitations of range, which are as follows:

1. It is based only on two items and does not cover all the items in a distribution.

2. It is subject to wide fluctuations from sample to sample based on the same population.

3. It fails to give any idea about the pattern of distribution. This was evident from the data given in Examples 1 and 3.

4. Finally, in the case of open-ended distributions, it is not possible to compute the range.

 

 

QUARTILE DEVIATION

The interquartile range or the quartile deviation is a better measure of variation in a distribution than the range. Here, avoiding the 25 percent of the distribution at both the ends uses the middle 50 percent of the distribution. In other words, the interquartile range denotes the difference between the third quartile and the first quartile. 

Symbolically, interquartile range = Q3- Q1

Semi interquartile range or Quartile deviation = (Q3 – Ql)/2

When quartile deviation is small, it means that there is a small deviation in the central 50 percent items.

It may be noted that in a symmetrical distribution, the two quartiles, that is, Q3 and QI are equidistant from the median.

Symbolically,  M-QI = Q3-M

It may be noted that interquartile range or the quartile deviation is an absolute measure of dispersion.

MERITS OF QUARTILE DEVIATION

1. As compared to range, it is considered a superior measure of dispersion.

2. In the case of open-ended distribution, it is quite suitable.

3. Since it is not influenced by the extreme values in a distribution, it is particularly suitable in highly skewed or erratic distributions.

MEAN DEVIATION

The mean deviation is also known as the average deviation. As the name implies, it is the average of absolute amounts by which the individual items deviate from the mean. Since the positive deviations from the mean are equal to the negative deviations, while computing the mean deviation, we ignore positive and negative signs.

 

 

MERITS OF MEAN DEVIATION

1. A major advantage of mean deviation is that it is simple to understand and easy to calculate. 

2. It takes into consideration each and every item in the distribution. As a result, a change in the value of any item will have its effect on the magnitude of mean deviation.

3. The values of extreme items have less effect on the value of the mean deviation.

4. As deviations are taken from a central value, it is possible to have meaningful comparisons of the formation of different distributions.

LIMITATIONS OF MEAN DEVIATION

1.      It is not capable of further algebraic treatment.

2.      At times it may fail to give accurate results. The mean deviation gives best results when deviations are taken from the median instead of from the mean. But in a series, which has wide variations in the items, median is not a satisfactory measure.

3.      Strictly on mathematical considerations, the method is wrong as it ignores the algebraic signs when the deviations are taken from the mean.

In view of these limitations, it is seldom used in business studies. A better measure known as the standard deviation is more frequently used.

STANDARD DEVIATION

The standard deviation is similar to the mean deviation in that here too the deviations are measured from the mean. At the same time, the standard deviation is preferred to the mean deviation or the quartile deviation or the range because it has desirable mathematical properties.

Mean of the squared deviations is known as the variance.

The actual mean would turn out to be in fraction, calculating deviations from the mean would be too cumbersome.

USES OF THE STANDARD DEVIATION

The standard deviation is a frequently used measure of dispersion. It enables us to determine as to how far individual items in a distribution deviate from its mean. In a symmetrical, bell-shaped curve:

(i) About 68 percent of the values in the population fall within:  + 1 standard deviation from the mean.

(ii) About 95 percent of the values will fall within +2 standard deviations from the mean. 

(iii) About 99 percent of the values will fall within + 3 standard deviations from the mean.

The standard deviation is an absolute measure of dispersion as it measures variation in the same units as the original data. As such, it cannot be a suitable measure while comparing two or more distributions.

STANDARDISED VARIABLE, STANDARD SCORES

The variable Z = (x - x )/s or (x - μ)/μ, which measures the deviation from the mean in units of the standard deviation, is called a standardised variable. Since both the numerator and the denominator are in the same units, a standardised variable is independent of units used.

If deviations from the mean are given in units of the standard deviation, they are said to be expressed in standard units or standard scores.

Through this concept of standardised variable, proper comparisons can be made between individual observations belonging to two different distributions whose compositions differ.

LORENZ CURVE

This measure of dispersion is graphical. It is known as the Lorenz curve named after Dr. Max Lorenz. It is generally used to show the extent of concentration of income and wealth. The steps involved in plotting the Lorenz curve are:

1. Convert a frequency distribution into a cumulative frequency table.

2. Calculate percentage for each item taking the total equal to 100.

3. Choose a suitable scale and plot the cumulative percentages of the persons and income. Use the horizontal axis of X to depict percentages of persons and the vertical axis of Y to depict percent ages of income.

4. Show the line of equal distribution, which will join 0 of X-axis with 100 of Yaxis.

5. The curve obtained in (3) above can now be compared with the straight line of equal distribution obtained in (4) above. If the Lorenz curve is close to the line of equal distribution, then it implies that the dispersion is much less. If, on the contrary, the Lorenz curve is farther away from the line of equal distribution, it implies that the dispersion is considerable.

The Lorenz curve is a simple graphical device to show the disparities of distribution in any phenomenon. It is, used in business and economics to represent inequalities in income, wealth, production, savings, and so on.

SKEWNESS

It may be repeated here that frequency distributions differ in three ways: Average value, Variability or dispersion, and Shape.

Generally, there are two comparable characteristics called skewness and kurtosis that help us to understand a distribution. Two distributions may have the same mean and standard deviation but may differ widely in their overall appearance.

important definitions of skewness are as follows:

1.  "When a series is not symmetrical it is said to be asymmetrical or skewed."

       -Croxton & Cowden.

2. "Skewness refers to the asymmetry or lack of symmetry in the shape of a

frequency distribution."     -Morris Hamburg.

3. "Measures of skewness tell us the direction and the extent of skewness. In

symmetrical distribution the mean, median and mode are identical. The more

the mean moves away from the mode, the larger the asymmetry or skewness."

       -Simpson & Kalka

4. "A distribution is said to be 'skewed' when the mean and the median fall at

different points in the distribution, and the balance (or centre of gravity) is

shifted to one side or the other-to left or right."   -Garrett

Symmetrical Distribution. It is clear from the diagram (a) that in a symmetrical distribution the values of mean, median and mode coincide. The spread of the frequencies is the same on both sides of the centre point of the curve.

Asymmetrical Distribution. A distribution, which is not symmetrical, is called a skewed distribution and such a distribution could either be positively skewed or negatively skewed as would be clear from the diagrams (b) and (c).

Positively Skewed Distribution. In the positively skewed distribution the value of the mean is maximum and that of mode least-the median lies in between the two as is clear from the diagram (b).

Negatively Skewed Distribution. The following is the shape of negatively skewed distribution. In a negatively skewed distribution the value of mode is maximum and that of mean least-the median lies in between the two. In the positively skewed distribution the frequencies are spread out over a greater range of values on the high-value end of the curve (the right-hand side) than they are on the low-value end. In the negatively skewed distribution the position is reversed, i.e. the excess tail is on the left-hand side. It should be noted that in moderately symmetrical distributions the interval between the mean and the median is approximately one-third of the interval between the mean and the mode. It is this relationship, which provides a means of measuring the degree of skewness.

In order to ascertain whether a distribution is skewed or not the following tests may be applied. Skewness is present if:

 1. The values of mean, median and mode do not coincide.

2. When the data are plotted on a graph they do not give the normal bell

shaped form i.e. when cut along a vertical line through the centre the two

halves are not equal.

3. The sum of the positive deviations from the median is not equal to the sum

of the negative deviations. 

4. Quartiles are not equidistant from the median. 

5. Frequencies are not equally distributed at points of equal deviation from

the mode.

 

MEASURES OF SKEWNESS

There are four measures of skewness, each divided into absolute and relative measures. The relative measure is known as the coefficient of skewness and is more frequently used than the absolute measure of skewness. Further, when a comparison between two or more distributions is involved, it is the relative measure of skewness, which is used.  The measures of skewness are:

(i)                 Karl Pearson's measure,

(ii)               Bowley’s measure,

(iii)             Kelly’s measure, and

(iv)             Moment’s measure.

 

The formula for measuring skewness as given by Karl Pearson is as follows:

  Skewness = Mean - Mode

Or 3 Mean - 3 Median = Mean - Mode 

Or Mode = Mean - 3 Mean + 3 Median 

Or Mode = 3 Median - 2 Mean

The direction of skewness is determined by ascertaining whether the mean is greater

than the mode or less than the mode. If it is greater than the mode, then skewness is

Mean – Mode Standard Deviation

The value of coefficient of skewness is zero, when the distribution is symmetrical. Normally, this coefficient of skewness lies between +1. If the mean is greater than the mode, then the coefficient of skewness will be positive, otherwise negative.

 

 

 


Comments

Popular posts from this blog

MCQ BANKING OPERATION LUCKNOW UNIVERSITY

MCQ on on Banking Operations

UPPSC GOVERNMENT DEGREE COLLEGE COMMERCE UNIT-IV