Rebecca Goldin Ph.D
Last week, The National Journal launched a scathing attack on the way casualties in Iraq were estimated by authors Gilbert Burnham, Riyadh Lafta, Shannon Doocy, and Les Roberts in a report published by Lancet in 2006. Many conservative pundits had already put their wits to the test trying to find ways to discredit the story, and President Bush himself claimed that “the methodology is pretty well discredited.”
The National Journal team did its homework, interviewing many experts (rather than conservative pundits) and categorizing the potential flaws of the study into different headings. However suspicious some facts surrounding the Lancet study might be (such as the anti-war position held by the scientists conducting the study), only two criticisms cited by the Journal raise any alarms.
One is “main street bias,” the idea that the Lancet study authors over-sampled regions near main streets, which were in turn more likely to be home to victims of car-bombs or other violence. The other is fraud – not by those who wrote the Lancet article, but by those in the field, doing the interviews under minimal supervision.
Less impressive, however, was the Journal’s claim that not enough interviews were conducted. As we described when we took apart The Wall Street Journal’s criticism of the study, the small sample size is accounted for in the large confidence interval. Without other bias, a smaller sample size has a larger chance of being unrepresentative of the population than a larger sample size. This is why the range in which we can be 95 percent confident that the sample reflects the whole Iraqi population is anywhere from 426,369 to 793,663.
The Journal made a convincing argument that the data may well have been tweaked, in part based on the theory that faked data has patterns that true data rarely fit into; for example, invented people reported as killed may be more likely to be 30 or 40 than 32 or 43. It doesn’t seem unusual if any individual is 30, but it’s awfully strange if all of the deaths consist of 30-year-olds. Apparently, those conducting the Lancet study did not put enough checks in place to ensure that interviewers didn’t pad the books. The data look like inventiveness may have played a role, based on which death certificates the survey conductors reported to have seen, and which they didn’t.
That said, we should be careful in reading too much into any particular statistical anomaly. By the nature of statistics, strange things can happen when dealing in large numbers. To illustrate the point, suppose that 100 people are flipping coins, each one doing ten flips and recording the answer. Suppose someone gets ten heads – how easy it is to make an accusation that she’s flipping a biased coin! After all, there’s a 1 in 1024 chance of getting ten heads in a row (one over two to the tenth power). However, if 100 people are flipping, the chance of someone getting either all heads or all tails is about 20 percent. Not so rare after all.
To make the metaphor concrete, accusations of data heaping cannot be made in isolation – just as the person who got ten heads cannot be accused of having a biased coin. If those looking for fault in the Lancet study only considered a few possible ways in which the data didn’t look random, then unusually distributed data is far more damning than if they considered many, many ways and found one.