Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

The Columbia and Microsoft study offers a clear example of rigorous data science and computers teaching us things our gut alone could never find. This is also one case where the size of the dataset matters. Sometimes there is insufficient experience for our unaided gut to draw upon. It is unlikely that you—or your close friends or family members—have seen enough cases of pancreatic cancer to tease out the difference between indigestion followed by abdominal pain compared to indigestion alone. Indeed, it is inevitable, as the Bing dataset gets bigger, that the researchers will pick up many more subtle patterns in the timing of symptoms—for this and other illnesses—that even doctors might miss.

Moreover, while our gut may usually give us a good general sense of how the world works, it is frequently not precise. We need data to sharpen the picture. Consider, for example, the effects of weather on mood. You would probably guess that people are more likely to feel more gloomy on a 10-degree day than on a 70-degree day. Indeed, this is correct. But you might not guess how big an impact this temperature difference can make. I looked for correlations between an area’s Google searches for depression and a wide range of factors, including economic conditions, education levels, and church attendance. Winter climate swamped all the rest. In winter months, warm climates, such as that of Honolulu, Hawaii, have 40 percent fewer depression searches than cold climates, such as that of Chicago, Illinois. Just how significant is this effect? An optimistic read of the effectiveness of antidepressants would find that the most effective drugs decrease the incidence of depression by only about 20 percent. To judge from the Google numbers, a Chicago-to-Honolulu move would be at least twice as effective as medication for your winter blues.*

Sometimes our gut, when not guided by careful computer analysis, can be dead wrong. We can get blinded by our own experiences and prejudices. Indeed, even though my grandmother is able to utilize her decades of experience to give better relationship advice than the rest of my family, she still has some dubious views on what makes a relationship last. For example, she has frequently emphasized to me the importance of having common friends. She believes that this was a key factor in her marriage’s success: she spent most warm evenings with her husband, my grandfather, in their small backyard in Queens, New York, sitting on lawn chairs and gossiping with their tight group of neighbors.

However, at the risk of throwing my own grandmother under the bus, data science suggests that Grandma’s theory is wrong. A team of computer scientists recently analyzed the biggest dataset ever assembled on human relationships—Facebook. They looked at a large number of couples who were, at some point, “in a relationship.” Some of these couples stayed “in a relationship.” Others switched their status to “single.” Having a common core group of friends, the researchers found, is a strong predictor that a relationship will not last. Perhaps hanging out every night with your partner and the same small group of people is not such a good thing; separate social circles may help make relationships stronger.

As you can see, our intuition alone, when we stay away from the computers and go with our gut, can sometimes amaze. But it can make big mistakes. Grandma may have fallen into one cognitive trap: we tend to exaggerate the relevance of our own experience. In the parlance of data scientists, we weight our data, and we give far too much weight to one particular data point: ourselves.

Grandma was so focused on her evening schmoozes with Grandpa and their friends that she did not think enough about other couples. She forgot to fully consider her brother-in-law and his wife, who chitchatted most nights with a small, consistent group of friends but fought frequently and divorced. She forgot to fully consider my parents, her daughter and son-in-law. My parents go their separate ways many nights—my dad to a jazz club or ball game with his friends, my mom to a restaurant or the theater with her friends; yet they remain happily married.

When relying on our gut, we can also be thrown off by the basic human fascination with the dramatic. We tend to overestimate the prevalence of anything that makes for a memorable story. For example, when asked in a survey, people consistently rank tornadoes as a more common cause of death than asthma. In fact, asthma causes about seventy times more deaths. Deaths by asthma don’t stand out—and don’t make the news. Deaths by tornadoes do.

We are often wrong, in other words, about how the world works when we rely just on what we hear or personally experience. While the methodology of good data science is often intuitive, the results are frequently counterintuitive. Data science takes a natural and intuitive human process—spotting patterns and making sense of them—and injects it with steroids, potentially showing us that the world works in a completely different way from how we thought it did. That’s what happened when I studied the predictors of basketball success.


When I was a little boy, I had one dream and one dream only: I wanted to grow up to be an economist and data scientist. No. I’m just kidding. I wanted desperately to be a professional basketball player, to follow in the footsteps of my hero, Patrick Ewing, all-star center for the New York Knicks.

I sometimes suspect that inside every data scientist is a kid trying to figure out why his childhood dreams didn’t come true. So it is not surprising that I recently investigated what it takes to make the NBA. The results of the investigation were surprising. In fact, they demonstrate once again how good data science can change your view of the world, and how counterintuitive the numbers can be.

The particular question I looked at is this: are you more likely to make it in the NBA if you grow up poor or middle-class?

Most people would guess the former. Conventional wisdom says that growing up in difficult circumstances, perhaps in the projects with a single, teenage mom, helps foster the drive necessary to reach the top levels of this intensely competitive sport.

This view was expressed by William Ellerbee, a high school basketball coach in Philadelphia, in an interview with Sports Illustrated. “Suburban kids tend to play for the fun of it,” Ellerbee said. “Inner-city kids look at basketball as a matter of life or death.” I, alas, was raised by married parents in the New Jersey suburbs. LeBron James, the best player of my generation, was born poor to a sixteen-year-old single mother in Akron, Ohio.

Indeed, an internet survey I conducted suggested that the majority of Americans think the same thing Coach Ellerbee and I thought: that most NBA players grow up in poverty.

Is this conventional wisdom correct?

Seth Stephens-Davidowitz's books