Competitive Exams: Dispersion

which are various measure of dispersion, explain each of them?

Answer: In many ways, measures of central tendency are less useful in statistical analysis than measures of dispersion of values around the central tendency The dispersion of values within variables is especially important in social and political research because:

Dispersion or “variation” in observations is what we seek texplain.

Researchers want tknow WHY some cases lie above average and others below average for a given variable:

TURNOUT in voting: Why dsome states show higher rates than others?

CRIMES in cities: Why are there differences in crime rates?

CIVIL STRIFE among countries: What accounts for differing amounts?

Much of statistical explanation aims at explaining DIFFERENCES in observations--alsknown as

VARIATION, or the more technical term, VARIANCE

If everything were the same, we would have nneed of statistics. But, people's heights, ages, etc. dvary. We often need tmeasure the extent twhich scores in a dataset differ from each other. Such a measure is called the dispersion of a distribution Some measure of dispersion are

  1. Range The range is the simplest measure of dispersion. The range can be thought of in twways.

    1. As a quantity: The difference between the highest and lowest scores in a distribution. “The range of scores on the exam was 32.”

    2. As an interval; the lowest and highest scores may be reported as the range. “The range was 62 t94,” which would be written (62, 94).

The Range of a Distribution

Find the range in the following sets of data: Number Of Brothers And Sisters {2, 3, 1, 1, 0, 5, 3, 1, 2, 7, 4, 0, 2, 1, 2, 1, 6, 3, 2, 0, 0, 7, 4, 2, 1, 1, 2, 1, 3, 5, 12, 4, 2, 0, 5, 3, 0, 2, 2, 1, 1, 8, 2, 1, 2 }

An outlier is an extreme score, i.e.. an infrequently occurring score at either tail of the distribution. Range is determined by the furthest outliers at either end of the distribution. Range is of limited use as a measure of dispersion, because it reflects information about extreme values but not necessarily about “typical” values. Only when the range is “narrow” (meaning that there are noutliers) does it tell us about typical values in the data.

Percentile range

Most students are familiar with the grading scale in which “C” is assigned taverage scores, “B” tabove-average scores, and sforth. When grading exams “on a curve,” instructors look tsee how a particular score compares tthe other scores. The letter grade given tan exam score is determined not by its relationship tjust the high and low scores, but by its relative position among all the scores. Percentile describes the relative location of points anywhere along the range of a distribution. A score that is at a certain percentile falls even with or above that percent of scores. The median score of a distribution is at the 50th percentile: It is the score at which 50% of other scores are below (or equal) and 50% are above. Commonly used percentile measures are named in terms of how they divide distributions. Quartiles divide scores intfourths, sthat a score falling in the first quartile lies within the lowest 25% of scores, while a score in the fourth quartile is higher than at least 75% of the scores. Quartile Finder

The divisions you have just performed illustrate quartile scores. Twother percentile scores commonly used tdescribe the dispersion in a distribution are decile and quintile scores which divide cases intequal sized subsets of tenths (10%) and fifths (20%), respectively. In theory, percentile scores divide a distribution int100 equal sized groups. In practice this may not be possible because the number of cases may be under 100. A box plot is an effective visual representation of both central tendency and dispersion. It simultaneously shows the 25th, 50th (median), and 75th percentile scores, along with the minimum and maximum scores. The “box” of the box plot shows the middle or “most typical” 50% of the values, while the “whiskers” of the box plot show the more extreme values. The length of the whiskers indicate visually how extreme the outliers are. Below is the box plot for the distribution you just separated intquartiles. The boundaries of the box plot's “box” line up with the columns for the quartile scores on the histogram. The box plot displays the median score and shows the range of the distribution as well.

By far the most commonly used measures of dispersion in the social sciences are

Variance and standard deviation

Variance is the average squared difference of scores from the mean score of a distribution. Standard deviation is the square root of the variance. In calculating the variance of data points, we square the difference between each point and the mean because if we summed the differences directly, the result would always be zero. For example, suppose three friends work on campus and earn $5.50, $7.50, and $8 per hour, respectively. The mean of these values is $ (5.50 + 7.50 + 8)/3 = $7 per hour. If we summed the differences of the mean from each wage, we would get (5.50 − 7) + (7.50 − 7) + (8 − 7) = -1.50 +. 50 + 1 = 0. Instead, we square the terms tobtain a variance equal t2.25 +. 25 + 1 = 3.50. This figure is a measure of dispersion in the set of scores. The variance is the minimum sum of squared differences of each score from any number. In other words, if we used any number other than the mean as the value from which each score is subtracted, the resulting sum of squared differences would be greater (You can try it yourself--see if any number other than 7 can be plugged intthe preceeding calculation and yield a sum of squared differences less than 3.50.). The standard deviation is simply the square root of the variance. In some sense, taking the square root of the variance “undoes” the squaring of the differences that we did when we calculated the variance. Variance and standard deviation of a population are designated by and, respectively. Variance and standard deviation of a sample are designated by s2 and s, respectively.

Standard Deviation

The standard deviation (or s) and variance (or s2) are more complete measures of dispersion which take intaccount every score in a distribution. The other measures of dispersion we have discussed are based on considerably less information. However, because variance relies on the squared differences of scores from the mean, a single outlier has greater impact on the size of the variance than does a single score near the mean. Some statisticians view this property as a shortcoming of variance as a measure of dispersion, especially when there is reason tdoubt the reliability of some of the extreme scores. For example, a researcher might believe that a person whreports watching television an average of 24 hours per day may have misunderstood the question. Just one such extreme score might result in an appreciably larger standard deviation, especially if the sample is small. Fortunately, since all scores are used in the calculation of variance, the many non-extreme scores (those closer tthe mean) will tend toffset the misleading impact of any extreme scores. The standard deviation and variance are the most commonly used measures of dispersion in the social sciences because: Both take intaccount the precise difference between each score and the mean. Consequently, these measures are based on a maximum amount of information.

The standard deviation is the baseline for defining the concept of standardized score or “z-score”

Variance in a set of scores on some dependent variable is a baseline for measuring the correlation between twor more variables (the degree twhich they are related).