The young men applying to become officers had been given a weirdly artificial task: to move themselves from one side of a wall to the other without touching the wall, using only a long log that was not permitted to touch either the wall or the ground. “We noted who took charge, who tried to lead and was rebuffed, how cooperative each soldier was in contributing to the group effort,” Danny wrote. “We saw who seemed to be stubborn, submissive, arrogant, patient, hot-tempered, persistent, or a quitter. We saw competitive spite when someone whose idea had been rejected by the group sabotaged its efforts. And we saw reactions to crisis. . . . Under the stress of the event, we felt, each man’s true nature was revealed. The impression we had of each candidate’s character was as direct and compelling as the color of the sky.”
He had had no trouble identifying which men would make good officers and which would not. “We were quite willing to declare, ‘This one will never make it,’ ‘That fellow is rather mediocre,’ or ‘He will be a star.’” The problem came when he’d tested his predictions against the outcomes—how the various candidates had actually performed in officer training. His predictions were worthless. And yet, because it was the army and he had a job to do, he kept on making them; and because he was Danny, he noted that he still felt confident about them. The situation reminded him of the famous Müller-Lyer optical illusion.
Figure 2. Müller-Lyer optical illusion.
Presented with two lines of equal length, the eye is tricked into seeing one as being longer than the other. Even after you prove to people, with a ruler, that the lines are identical, the illusion persists: They’ll insist that one line still looks longer than the other. If perception had the power to overwhelm reality in such a simple case, how much power might it have in a more complicated one?
Danny’s commanding officers believed that each branch of the Israel Defense Forces had its own personality. There was a “fighter pilot” type, and an “armored unit” type, and an “infantry soldier” type, and so on. They wanted Danny to determine for which branch any particular recruit was best suited. Danny set out to create a personality test that would effectively sort the entire population of Israel into the correct buckets. He began by listing the handful of traits he thought most obviously correlated with a man’s fitness for combat service: masculine pride, punctuality, sociability, sense of duty, capacity for independent thought. “The list of traits was not derived from anything,” he later said. “I just thought it up. A professional would take years to do it, using pre-tests, trying out multiple versions, etcetera, but I didn’t know it was difficult to do.”
The hard part, Danny thought, was getting an accurate measure of any of these traits from an ordinary job interview. The subtle difficulties that arise when people evaluate other people had been described back in 1915 by an American psychologist named Edward Thorndike. Thorndike asked U.S. Army officers to rate their men according to some physical trait (“physique,” for example) and then assess some less tangible quality (“intelligence,” “leadership,” and so forth). He discovered that the feeling created by making the first ranking bled into the second: If an officer thought a soldier physically impressive, he also found him impressive in other ways. Switch the order of assessment, and the same problem occurred: If a person was first judged to be generally great, he was then judged to be stronger than he actually was. “Obviously a halo of general merit is extended to influence the rating for the special ability, or vice versa,” Thorndike concluded; he went on to say that he had “become convinced that even a very capable foreman, employer, teacher, or department head is unable to view an individual as a compound of separate qualities and to assign a magnitude to each of these in independence of the others.” Thus was born what is still called “the halo effect.”
Danny knew of the halo effect. And he could see that the Israeli army interviewers had been its victims: They had been spending twenty minutes with each new recruit and from the encounter offering a general impression of the recruit’s character. General impressions had been proven to be misleading, and so Danny wanted to avoid them. For that matter, he wanted to avoid having to rely on human judgment. Exactly why he mistrusted human judgment he was unsure. In retrospect, he suspected he must have read a recent book by Paul Meehl—the same Meehl who wondered what, if anything, unified the field of psychology. Meehl’s book, called Clinical versus Statistical Prediction, had shown that psychoanalysts who tried to predict what would become of their neurotic patients fared poorly compared to simple algorithms. Published in 1954—just a year before Danny overhauled the Israeli army’s assessment of the country’s youth—it had angered psychoanalysts, who believed that their clinical judgments and predictions had great value. It also raised a more general question: If these putative experts could be misled about the value of their predictions, who would not be misled? “All I know is that I must have read Meehl because of what I did,” said Danny.
What he did was teach the army interviewers—young women, mainly—how to put a list of questions to each recruit to minimize the halo effect. He told them to pose very specific questions, designed to determine not how a person thought of himself but how the person had actually behaved. The questions were not just fact-seeking but designed to disguise the facts being sought. And at the end of each section, before moving on to the next, the interviewer was to assign a rating from 1 to 5 that corresponded with choices ranging from “never displays this kind of behavior” to “always displays this kind of behavior.” So, for example, when evaluating a recruit’s sociability, they’d give a 5 to a person who “forms close social relationships and identifies completely with the whole group” and a 1 to “a person who was “completely isolated.” Even Danny could see that there were all kinds of problems with his methods, but he didn’t have the time to worry too much about them. For instance, he briefly agonized over how to define a 3—was it someone who was extremely sociable on occasion, or someone who was moderately sociable all the time? Both, he basically decided. The big thing was that the judge was to keep her private opinions to herself. The question was not “What do I think of him?” but “What has he done?” The judgment of who went where in the Israeli army was to be made by Danny’s algorithm. “The interviewers hated it,” he recalled. “I had a mutiny on my hands. I still remember one of them saying, ‘You’re turning us into robots.’ They had a sense that they could tell [a person’s character]. And I was robbing them of that. And they really didn’t like it.”