Even people trained in statistics and probability theory failed to intuit how much more variable a small sample could be than the general population—and that the smaller the sample, the lower the likelihood that it would mirror the broader population. They assumed that the sample would correct itself until it mirrored the population from which it was drawn. In very large populations, the law of large numbers did indeed guarantee this result. If you flipped a coin a thousand times, you were more likely to end up with heads or tails roughly half the time than if you flipped it ten times. For some reason human beings did not see it that way. “People’s intuitions about random sampling appear to satisfy the law of small numbers, which asserts that the law of large numbers applies to small numbers as well,” Danny and Amos wrote.
This failure of human intuition had all sorts of implications for how people moved through the world, and rendered judgments and made decisions, but Danny and Amos’s paper—eventually published in the Psychological Bulletin—dwelled on its consequences for social science. Social science experiments usually involved taking some small sample from a large population and testing some theory on it. Say a psychologist thought that he had discovered a connection: Children who preferred to sleep alone on camping trips were somewhat less likely to participate in social activities than were children who preferred eight-person tents. The psychologist had tested a group of twenty kids, and they confirmed his hypothesis. Not every child who wanted to sleep alone was asocial, and not every child who longed for an eight-person tent was highly sociable—but the pattern existed. The psychologist, being a conscientious scientist, selects a second sample of kids—to see if he can replicate this finding. But because he has misjudged how large the sample needs to be if it is to stand a good chance of reflecting the entire population, he is at the mercy of luck.* Given the inherent variability of the small sample, the kids in his second sample might be unrepresentative, not at all like most children. And yet he treated them as if they had the power to confirm or refute his hypothesis.
The belief in the law of small numbers: Here was the intellectual error that Danny and Amos suspected that a lot of psychologists made, because Danny had made it. And Danny had a far better feel for statistics than most psychologists, or even most statisticians. The entire project, in other words, was rooted in Danny’s doubts about his own work, and his willingness, which was almost an eagerness, to find error in that work. In their joint hands, Danny’s tendency to look for his own mistakes became the most fantastic material. For it wasn’t just Danny who made those mistakes: Everyone did. It wasn’t just a personal problem; it was a glitch in human nature. At least that was their suspicion.
The test they administered to psychologists confirmed that suspicion. When seeking to determine if the bag they held contained mostly red chips, psychologists were inclined to draw, from very few chips, broad conclusions. In their search for scientific truth, they were relying far more than they knew on chance. What’s more, because they had so much faith in the power of small samples, they tended to rationalize whatever they found in them.
The test Amos and Danny had created asked the psychologists how they would advise a student who was testing a psychological theory—say, that people with long noses are more likely to lie. What should the student do if his theory tests as true on one sample of humanity but as false on another? The question Danny and Amos put to the professional psychologists was multiple-choice. Three of the choices involved telling the student either to increase his sample size or, at the very least, to be more circumspect about his theory. Overwhelmingly, the psychologists had plunked for the fourth option, which read: “He should try to find an explanation for the differences between the two groups.”
That is, he should seek to rationalize why in one group people with long noses are more likely to lie, while in the other they are not. The psychologists had so much faith in small samples that they assumed that whatever had been learned from either group must be generally true, even if one lesson seemed to contradict the other. The experimental psychologist “rarely attributes a deviation of results from expectations to sampling variability because he finds a causal ‘explanation’ for any discrepancy,” wrote Danny and Amos. “Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.”
To which Amos, by himself, appended: “Edwards . . . has argued that people fail to extract sufficient information or certainty from probabilistic data; he called this failure conservatism. Our respondents can hardly be described as conservative. Rather, in accord with the representation hypothesis, they tend to extract more certainty from the data than the data, in fact, contain.” (“Ward Edwards was established,” said Danny. “And we were taking pot shots—Amos was sticking his tongue out at him.”)
By the time they were finished with the paper, in early 1970, they had lost any clear sense of their individual contributions. It was nearly impossible to say, of any given passage, whether more of some idea had come from Danny or from Amos. Far more easily determined, at least for Danny, was responsibility for the paper’s confident, almost brazen, tone. Danny had always been a nervous scholar. “If I had written it alone, in addition to being tentative and having a hundred references, I would probably have confessed that I am only a recently reformed idiot,” he said. “I could have done the paper all by myself. Except that if I had done it alone people would not have paid it attention. It had a star quality. And I attributed that quality to Amos.”
He thought that their paper was funny and provocative and interesting and arrogant in a way he could never be on his own, but in truth he didn’t think any more than that—and he didn’t think Amos did, either. Then they gave the paper to a person they assumed would be a skeptical audience, a psychology professor at the University of Michigan named Dave Krantz. Krantz was a serious mathematician, and also one of Amos’s coauthors on the impenetrable multivolume Foundations of Measurement. “I thought it was a stroke of genius,” recalled Krantz. “I still think it is one of the most important papers that has ever been written. It was counter to all the work that was being done—which was governed by the idea that you were going to explain human judgment by correcting for some more or less minor error to the Bayesian model. It was exactly contrary to the ideas that I had. Statistics was the way you should think about probabilistic situations, but statistics was not the way people did it. Their subjects were all sophisticated in statistics—and even they got it wrong! Every question in the paper that the audience got wrong I felt the temptation to get wrong.”
That verdict—that Danny and Amos’s paper wasn’t just fun but important—would eventually be echoed outside of psychology. “Over and over again economists say, ‘If the evidence of the world tells you it is true, then people figure out what’s true,’” says Matthew Rabin, a professor of economics at Harvard University. “That people are, in effect, very good statisticians. And if they aren’t—well, they don’t survive. And so if you are going down the list of things that are important in the world, the fact that people don’t believe in statistics is pretty important.”
Danny, being Danny, was slow to accept the compliment. (“When Dave Krantz said, ‘It’s a breakthrough,’ I thought he was out of his mind.”) Still, he and Amos were onto something far bigger than an argument about how to use statistics. The power of the pull of a small amount of evidence was such that even those who knew they should resist it succumbed. People’s “intuitive expectations are governed by a consistent misperception of the world,” Danny and Amos had written in their final paragraph. The misperception was rooted in the human mind. If the mind, when it was making probabilistic judgments about an uncertain world, was not an intuitive statistician, what was it? If it wasn’t doing what the leading social scientists thought it did, and economic theory assumed that it did, what, exactly, was it doing?
* * *