In the late 1990s, a professor of cognitive science at the Massachusetts Institute of Technology named Joshua Tenenbaum began a large-scale examination of the casual ways that people make everyday predictions. There are dozens of questions each of us confront on a daily basis that can be answered only with some amount of forecasting. When we estimate how long a meeting will last, for instance, or envision two driving routes and guess at which one will have less traffic, or predict whether our families will have more fun at the beach or at Disneyland, we’re making forecasts that assign likelihoods to various outcomes. We may not realize it, but we’re thinking probabilistically. How, Tenenbaum wondered, do our brains do that?
Tenenbaum’s specialty was computational cognition—in particular, the similarities in how computers and humans process information. A computer is an inherently deterministic machine. It can predict if your family will prefer the beach or Disneyland only if you give it a specific formula for comparing the merits of beach fun versus amusement parks. Humans, on the other hand, can make such decisions even if we’ve never visited the seaside or Magic Kingdom before. Our brains can infer from past experiences that, because the kids always complain when we go camping and love watching cartoons, everyone will probably have more fun with Mickey and Goofy.
“How do our minds get so much from so little?” Tenenbaum wrote in a paper published in the journal Science in 2011. “Any parent knows, and scientists have confirmed, that typical 2-year-olds can learn how to use a new word such as ‘horse’ or ‘hairbrush’ from seeing just a few examples.” To a two-year-old, horses and hairbrushes have a great deal in common. The words sound similar. In pictures, they both have long bodies with a series of straight lines—in one case legs, in the other bristles—extruding outward. They come in a range of colors. And yet, though a child might have seen only one picture of a horse and used only one hairbrush, she can quickly learn the difference between those words.
A computer, on the other hand, needs explicit instructions to learn when to use “horse” versus “hairbrush.” It needs software that specifies that four legs increases the odds of horsiness, while one hundred bristles increases the probability of a hairbrush. A child can make such calculations before she can form sentences. “Viewed as a computation on sensory input data, this is a remarkable feat,” Tenenbaum wrote. “How does a child grasp the boundaries of these subsets from seeing just one or a few examples of each?”
In other words, why are we so good at forecasting certain kinds of things—and thus, making decisions—when we have so little exposure to all the possible odds?
In an attempt to answer this question, Tenenbaum and a colleague, Thomas Griffiths, devised an experiment. They scoured the Internet for data on different kinds of predictable events, such as how much money a movie will make at the box office, or how long the average person lives, or how long a cake needs to bake. They were interested in these events because if you were to graph multiple examples of each one, a distinct pattern would emerge. Box office totals, for instance, typically conform to a basic rule: There are a few blockbusters each year that make a huge amount of money, and lots of other films that never break even.
Within mathematics, this is known as a “power law distribution,” and when the revenues of all the movies released in a given year are graphed together, it looks like this:
Graphing other kinds of events results in different patterns. Take life spans. A person’s odds of dying in a specific year spike slightly at birth—because some infants perish soon after they arrive—but if a baby survives its first few years, it is likely to live decades longer. Then, starting at about age forty, our odds of dying start accelerating. By fifty, the likelihood of death jumps each year until it peaks at about eighty-two.
Life spans adhere to a normal, or Gaussian, distribution curve. That pattern looks like this:
Most people intuitively understand that they need to apply different kinds of reasoning to predicting different kinds of events. We know that box office totals and life spans require different types of estimations, even if we don’t know anything about medical statistics or entertainment industry trends. Tenenbaum and Griffiths were curious to find out how people intuitively learn to make such estimations. So they found events with distinct patterns, from box office grosses to life spans, as well as the average length of poems, the careers of congressmen (which adhere to an Erlang distribution), and the length of time a cake needs to bake (which has no strong pattern).
Then they asked hundreds of students to predict the future based on one piece of data:
You read about a movie that has made $60 million to date. How much will it make in total?
You meet someone who is thirty-nine years old. How long will he or she live?
A cake has been baking for fourteen minutes. How much longer does it need to stay in the oven?
You meet a U.S. congressman who has served for fifteen years. How long will he serve in total?
The students weren’t given any additional information. They weren’t told anything about power law distributions or Erlang curves. Rather, they were simply asked to make a prediction based on one piece of data and no guidance about what kinds of probabilities to apply.
Despite those handicaps, the students’ predictions were startlingly accurate. They knew that a movie that’s earned $60 million is a blockbuster, and is likely to take in another $30 million in ticket sales. They intuited that if you meet someone in their thirties, they’ll probably live another fifty years. They guessed that if you meet a congressman who has been in power for fifteen years, he’ll probably serve another seven or so, because incumbency brings advantages, but even powerful lawmakers can be undone by political trends.
If asked, few of the participants were able to describe the mental logic they used to make their forecasts. They just gave answers that felt right. On average, their predictions were often within 10 percent of what the data said was the correct answer. In fact, when Tenenbaum and Griffiths graphed all of the students’ predictions for each question, the resulting distribution curves almost perfectly matched the real patterns the professors had found in the data they had collected online.
Just as important, each student intuitively understood that different kinds of predictions required different kinds of reasoning. They understood, without necessarily knowing why, that life spans fit into a normal distribution curve whereas box office grosses tend to conform to a power law.