|
RISKVUE ARCHIVE | FEATURE STORIES
Lying With Statistics
Statistics are islands of certainty in a sea of unknowns. Islands of certainty, that is, unless they are biased, which is often the case. Statistics are commonly used to support a biased position or an outright fabrication for two reasons. The first reason arises from the fact that few people understand statistics well enough to question them. The second and more sinister reason is that lying with statistics requires no actual lying. If the most favorable data is highlighted and the most unfavorable data is suppressed, statistics can be manipulated to illustrate just about any point of view, allowing the manipulator’s hands to remain unsullied.
Below, we identify some of the ways in which statistics are used to deceive or mislead, and provide suggestions on how to avoid falling victim to misleading statistics.
Understand Your Averages
One common way that statistics can be misleading is when they refer to so-called “averages.” When you hear the word “average,” your first question should be “What kind?” because an average can come in three flavors: a mean,1 median2 or mode.3
Sometimes the mean, the median and the mode are so arithmetically close to one another as not to really matter to a layman. This occurs when the data is characterized by what is known as a normal distribution.4 Human attributes, such as the height of men and women, are characterized by a normal distribution. If you read that the average height of American men is 5' 10", it makes little difference whether the average refers to the mean, median or mode because they will all be very close to one another. However, not every set of data will be “normal.”
For example, the average limits of liability for umbrella excess liability carried by different organizations might vary widely. The problem is that in a particular sample you might find that the limits carried ranged from 1 million to 20 million or so and that some carry limits in excess of 100 million. The very large limits carried by a small number of companies would act to skew the results indicating that mean average limits carried by organizations are much higher than those indicated by a median calculation. Therefore, the “average” limits of liability will depend on what type of average is being used.
Graphs And Pictures
| Exhibit A |
Below are a pair of charts that show the same information. However, the scale of Chart 2 has been reduced to make it appear as if a dramatic rise is taking place. If Chart 1 were not lying right beside Chart 2 to act as a point of comparison, a risk manager might be easily convinced that a major sea change was underway.
 |
Another method of statistical prevarication is to use deceptive visual graphics. Because many business concepts and ideas are technical and complex, a common and useful way of conveying such information is through the use of graphs, charts and pictures. Although illustrations can reflect the true facts under consideration, such visual information is easily massaged, edited or distorted to manipulate how the information will be interpreted.
Take, for example, line charts, which are commonly used to represent trends. The charts in Exhibit A are based on the same information (an organization’s cost of risk over time), yet they visually convey drastically different trends. By maximizing or minimizing the magnitude of values shown in the chart or by changing the scale of the axis, different scenarios can be depicted, even though the underlying numbers are the same.
Ways To Avoid Being Duped
Determine the significance. Often, data samples are not sufficient to permit a reliable conclusion. Therefore, be suspicious of any data that does not identify the number of cases sampled or does not provide the probable error.5
Skeptical thinking: check the numbers. Sometimes statistics can just be plain wrong. When dealing with strings of zeros and significant figures, mistakes are bound to happen. Take, for instance, the report the federal Glass Ceiling Commission issued in 1995, which states that companies with especially good records in promoting women and minorities have a stock market performance 2.4 times higher than the Standard & Poor’s index. When a reporter from Forbes checked the study that the report cited, he found that the companies outperformed the S&P index by 2.4 percent, not 2.4 times, and only over a certain period.6
Ask for a point of comparison. When a statistic is used to describe some unfamiliar thing or event, most people have nothing to compare it against and often accept that statistic as truth. The way to avoid falling into this trap is to ask the question: “Compared to what?” The answer to this simple question will provide context for the significance of the statistic, and may uncover a flawed argument or an attempt to mislead.
For example, in September 1997, the Associated Press reported a new study that linked low levels of radioactivity to cancer deaths among nuclear workers. Almost a third (29%) of all deaths among the studied employees were linked to cancer. Shocking. Scandalous. Something must be done! Not out of the ordinary really, explains the Washington, DC Statistical Assessment Service (STATS). STATS observes that 35% of all deaths of those between 44 and 65 years of age are attributable to cancer; therefore, the workers died from cancer at the same proportion as anyone else.7
Conclusion
Risk managers should keep in mind that all statistics presented will invariably carry some bias. However, some persons will skew statistics far from their original meaning.
This skewing process is most concisely described in the 1954 classic, How To Lie With Statistics.8 Author Darrell Huff observes, “What comes full of virtue from the statistician’s desk may find itself twisted, exaggerated, oversimplified, and distorted-through-selection by salesman, public relations expert, journalist, or advertising copywriter.” 
Notes
1 Mean: Any statistical measure of central tendency.
2 Median: A statistical average indicating central tendency, identified by a midpoint with one half of the items above and one half below the midpoint.
3 Mode: A statistical measure of common tendency that identifies the value of the variable giving the greatest height on a graph of the frequency distribution.
4 Normal distribution: A theoretical frequency distribution for a set of variable data represented by a bell-shaped curve symmetrical about the mean.
5 Probable error: Expresses a range of values, either above or below the stated value, in which the actual value lies.
6 “Lies, Damned Lies and Politically Motivated Statistics” by Dan Seligman. Forbes, July 28, 1997.
7 “The 1997 Dubious Data Awards.” STATS Spotlight, http://www.stats.org/.
8 How To Lie With Statistics by Darrell Huff. WW Norton: 1954.
riskVue | The webzine for risk management professionals
July 2001
|