Saturday, March 20, 2021

How to Lie with Statistics – Part 1: The Sample with the Built-in Bias

In times of confusion I have a general principle that I often apply: go back to basics! In other words, when things may appear complex, gibberish, or outright confusing, you should still be able to filter down key pieces of information in such a way that will allow almost any individual with some curiosity to discern the quality of the results. All you need are some simple tools. A hammer goes a long way to hammering nails, as opposed to using a piece of rock. So, it is with the intent of passing along some useful tools that I write the following book summary from the classic How to Lie with Statistics by Darrell Huff. This is a fantastic little book that gives insight as to the tricks used by many people (particularly those who have an agenda) in order to fit the data to their conclusions. Beware of such people.

I will write a short summary of each chapter and then posted it here periodically. Here is Chapter 1.

The Sample with the Built-in Bias

When we want to get a sense of a general trend in some population, we normally select a sample from that population. That sample must be representative of the whole. When it is not, then any metric we calculate will be an aberration, an illusion, or an incorrect figure. A key metric, or more commonly referred to as an statistic, is the “average”: the “average” person, the “average” temperature, the “average” employee, etc. Be on the alert when you come across such phrase because the proverbial “average” may be ripe with all sorts of bias – some of which is seen and some of which is unseen.

As I mentioned, a sample is supposed to be representative of the population. When it is not and we subsequently extrapolate conclusions based on the sample metric, we will draw all sorts of wrong implications. And by extrapolation I mean that you take some metric and apply conclusions beyond the boundaries that the metric allows you. For example, let us say a prestigious business school publishes the “average” salaries obtained from respondents who submitted surveys. The likely outcome is that a very handsome salary will be the average and thus published to the public. But consider the following as relates to this survey: self-respondents may tend to exaggerate, or those not choosing to report might do so because of their low salaries and not wanting to be perceived as failures. In our example, if these two factors are not disclosed or accounted for, you might as well disregard the “average” salary metric as bogus. 

Precision of a metrics is also a red flag: for example, when you read/hear that an “average” family has “2.05” children, or that someone on average takes “7.85” showers a week, etc. Such number is so precise that it frankly renders is meaningless. There are many things in life that are nearly impossible to measure with such precision.  

No comments: