The Undoing Project: A Friendship that Changed the World

But then UCLA sent back the analyzed data, and the story became unsettling. (Goldberg described the results as “generally terrifying.”) In the first place, the simple model that the researchers had created as their starting point for understanding how doctors rendered their diagnoses proved to be extremely good at predicting the doctors’ diagnoses. The doctors might want to believe that their thought processes were subtle and complicated, but a simple model captured these perfectly well. That did not mean that their thinking was necessarily simple, only that it could be captured by a simple model. More surprisingly, the doctors’ diagnoses were all over the map: The experts didn’t agree with each other. Even more surprisingly, when presented with duplicates of the same ulcer, every doctor had contradicted himself and rendered more than one diagnosis: These doctors apparently could not even agree with themselves. “These findings suggest that diagnostic agreement in clinical medicine may not be much greater than that found in clinical psychology—some food for thought during your next visit to the family doctor,” wrote Goldberg. If the doctors disagreed among themselves, they of course couldn’t all be right—and they weren’t.

The researchers then repeated the experiment with clinical psychologists and psychiatrists, who gave them the list of factors they considered when deciding whether it was safe to release a patient from a psychiatric hospital. Once again, the experts were all over the map. Even more bizarrely, those with the least training (graduate students) were just as accurate as the fully trained ones (paid pros) in their predictions about what any given psychiatric patient would get up to if you let him out the door. Experience appeared to be of little value in judging, say, whether a person was at risk of committing suicide. Or, as Goldberg put it, “Accuracy on this task was not associated with the amount of professional experience of the judge.”

Still, Goldberg was slow to blame the doctors. Toward the end of his paper, he suggested that the problem might be that doctors and psychiatrists seldom had a fair chance to judge the accuracy of their thinking and, if necessary, change it. What was lacking was “immediate feedback.” And so, with an Oregon Research Institute colleague named Leonard Rorer, he tried to provide it. Goldberg and Rorer gave two groups of psychologists thousands of hypothetical cases to diagnose. One group received immediate feedback on its diagnoses; the other did not—the purpose was to see if the ones who got feedback improved.

The results were not encouraging. “It now appears that our initial formulation of the problem of learning clinical inference was far too simple—that a good deal more than outcome feedback is necessary for judges to learn a task as difficult as this one,” wrote Goldberg. At which point one of Goldberg’s fellow Oregon researchers—Goldberg doesn’t recall which one—made a radical suggestion. “Someone said, ‘One of these models you built [to predict what the doctors were doing] might actually be better than the doctor,’” recalled Goldberg. “I thought, Oh, Christ, you idiot, how could that possibly be true?” How could their simple model be better at, say, diagnosing cancer than a doctor? The model had been created, in effect, by the doctors. The doctors had given the researchers all the information in it.

The Oregon researchers went and tested the hypothesis anyway. It turned out to be true. If you wanted to know whether you had cancer or not, you were better off using the algorithm that the researchers had created than you were asking the radiologist to study the X-ray. The simple algorithm had outperformed not merely the group of doctors; it had outperformed even the single best doctor. You could beat the doctor by replacing him with an equation created by people who knew nothing about medicine and had simply asked a few questions of doctors.

When Goldberg sat down to write a follow-up paper, which he called “Man versus Model of Man,” he was clearly less optimistic than he had formerly been, both about experts and the approach taken by the Oregon Research Institute to understanding their minds. “My article . . . was an account of our experimental failures—failures to demonstrate the complexities of human judgments,” he wrote of his earlier piece: the one he’d published in American Psychologist. “Since the previous anecdotal literature was filled with speculations about the complex interactions to be expected when professionals process clinical information, we had naively expected to find that the simple linear combination of cues would not be highly predictive of individual’s judgments, and consequently that we would soon be in the business of devising highly complex mathematical expressions to represent individual judgment strategy. Alas, it was not to be.” It was as if the doctors had a theory of how much weight to assign to any given trait of any given ulcer. The model captured their theory of how to best diagnose an ulcer. But in practice they did not abide by their own ideas of how to best diagnose an ulcer. As a result, they were beaten by their own model.

The implications were vast. “If these findings can be generalized to other sorts of judgmental problems,” Goldberg wrote, “it would appear that only rarely—if at all—will the utilities favor the continued employment of man over a model of man.” But how could that be? Why would the judgment of an expert—a medical doctor, no less—be inferior to a model crafted from that very expert’s own knowledge? At that point, Goldberg more or less threw up his hands and said, Well, even experts are human. “The clinician is not a machine,” he wrote. “While he possesses his full share of human learning and hypothesis-generating skills, he lacks the machine’s reliability. He ‘has his days’: Boredom, fatigue, illness, situational and interpersonal distractions all plague him, with the result that his repeated judgments of the exact same stimulus configuration are not identical. . . . If we could remove some of this human unreliability by eliminating this random error in his judgments, we should thereby increase the validity of the resulting predictions . . .”

Right after Goldberg published those words, late in the summer of 1970, Amos Tversky showed up in Eugene, Oregon. He was on his way to spend a year at Stanford and wanted to visit his old friend Paul Slovic, with whom he’d studied at Michigan. Slovic, a former college basketball player, recalls shooting baskets with Amos in his driveway. Amos, who had not played college basketball, didn’t really shoot so much as heave the ball at the rim—his jump shot looked more like calisthenics than hoops. “A three-quarters speed, spinless shot put which started at mid-chest and wafted toward the basket,” in the words of his son Oren. And yet Amos had somehow become a basketball enthusiast. “Some people like to walk while they talk. Amos liked to shoot baskets,” said Slovic, adding delicately that “he didn’t look like someone who had spent a lot of time shooting baskets.” Heaving the ball at the rim, Amos told Slovic that he and Danny had been kicking around some ideas about the inner workings of the human mind and hoped to further explore how people made intuitive judgments. “He said they wanted a place where they could just sit and talk to each other all day long without the distraction of a university,” said Slovic. They had some thoughts about why even experts might make big, systematic errors. And it wasn’t just because they were having a bad day. “And I was just kind of stunned by how exciting the ideas were,” said Slovic.



* * *





Amos had agreed to spend the 1970–71 academic year at Stanford University, and so he and Danny, who remained in Israel, were apart. They used the year to collect data. The data consisted entirely of answers to curious questions that they had devised. Their questions were first posed to high school students in Israel—Danny sent out twenty or so Hebrew University graduate students in taxis to scour the entire country for unsuspecting Israeli children. (“We were running out of kids in Jerusalem.”) The graduate students gave each kid two to four of what must have seemed to them totally bizarre questions, and a couple of minutes to answer each of them. “We had multiple questionnaires,” said Danny, “because no one child could do the whole thing.”

Consider the following question:

All families of six children in a city were surveyed. In 72 families the exact order of births of boys and girls was G B G B B G.

What is your estimate of the number of families surveyed in which the exact order of births was B G B B B B?

Michael Lewis's books