Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

The data shows that the hours between 2 and 4 A.M. are prime time for big questions: What is the meaning of consciousness? Does free will exist? Is there life on other planets? The popularity of these questions late at night may be a result, in part, of cannabis use. Search rates for “how to roll a joint” peak between 1 and 2 A.M.

And in their large dataset, Dahl and DellaVigna could look at how crime changed by the hour on those movies weekends. They found that the drop in crime when popular violent movies were shown—relative to other weekends—began in the early evening. Crime was lower, in other words, before the violent scenes even started, when theatergoers may have just been walking in.

Can you guess why? Think, first, about who is likely to choose to attend a violent movie. It’s young men—particularly young, aggressive men.

Think, next, about where crimes tend to be committed. Rarely in a movie theater. There have been exceptions, most notably a 2012 premeditated shooting in a Colorado theater. But, by and large, men go to theaters unarmed and sit, silently.

Offer young, aggressive men the chance to see Hannibal, and they will go to the movies. Offer young, aggressive men Runaway Bride as their option, and they will take a pass and instead go out, perhaps to a bar, club, or a pool hall, where the incidence of violent crime is higher.

Violent movies keep potentially violent people off the streets.

Puzzle solved. Right? Not quite yet. There was one more strange thing in the data. The effects started right when the movies started showing; however, they did not stop after the movie ended and the theater closed. On evenings where violent movies were showing, crime was lower well into the night, from midnight to 6 A.M.

Even if crime was lower while the young men were in the movie theater, shouldn’t it rise after they left and were no longer preoccupied? They had just watched a violent movie, which experiments say makes people more angry and aggressive.

Can you think of any explanations for why crime still dropped after the movie ended? After much thought, the authors, who were crime experts, had another “Aha” moment. They knew that alcohol is a major contributor to crime. The authors had sat in enough movie theaters to know that virtually no theaters in the United States serve liquor. Indeed, the authors found that alcohol-related crimes plummeted in late-night hours after violent movies.

Of course, Dahl and DellaVigna’s results were limited. They could not, for instance, test the months-out, lasting effects—to see how long the drop in crime might last. And it’s still possible that consistent exposure to violent movies ultimately leads to more violence. However, their study does put the immediate impact of violent movies, which has been the main theme in these experiments, into perspective. Perhaps a violent movie does influence some people and make them unusually angry and aggressive. However, do you know what undeniably influences people in a violent direction? Hanging out with other potentially violent men and drinking.*

This makes sense now. But it didn’t make sense before Dahl and DellaVigna began analyzing piles of data.

One more important point that becomes clear when we zoom in: the world is complicated. Actions we take today can have distant effects, most of them unintended. Ideas spread—sometimes slowly; other times exponentially, like viruses. People respond in unpredictable ways to incentives.

These connections and relationships, these surges and swells, cannot be traced with tiny surveys or traditional data methods. The world, quite simply, is too complex and too rich for little data.





OUR DOPPELGANGERS




In June 2009, David “Big Papi” Ortiz looked like he was done. During the previous half decade, Boston had fallen in love with their Dominican-born slugger with the friendly smile and gapped teeth.

He had made five consecutive All-Star games, won an MVP Award, and helped end Boston’s eighty-six-year championship drought. But in the 2008 season, at the age of thirty-two, his numbers fell off. His batting average had dropped 68 points, his on-base percentage 76 points, his slugging percentage 114 points. And at the start of the 2009 season, Ortiz’s numbers were dropping further.

Here’s how Bill Simmons, a sportswriter and passionate Boston Red Sox fan, described what was happening in the early months of the 2009 season: “It’s clear that David Ortiz no longer excels at baseball. . . . Beefy sluggers are like porn stars, wrestlers, NBA centers and trophy wives: When it goes, it goes.” Great sports fans trust their eyes, and Simmons’s eyes told him Ortiz was finished. In fact, Simmons predicted he would be benched or released shortly.

Was Ortiz really finished? If you’re the Boston general manager, in 2009, do you cut him? More generally, how can we predict how a baseball player will perform in the future? Even more generally, how can we use Big Data to predict what people will do in the future?

A theory that will get you far in data science is this: look at what sabermetricians (those who have used data to study baseball) have done and expect it to spread out to other areas of data science. Baseball was among the first fields with comprehensive datasets on just about everything, and an army of smart people willing to devote their lives to making sense of that data. Now, just about every field is there or getting there. Baseball comes first; every other field follows. Sabermetrics eats the world.

The simplest way to predict a baseball player’s future is to assume he will continue performing as he currently is. If a player has struggled for the past 1.5 years, you might guess that he will struggle for the next 1.5 years.

By this methodology, Boston should have cut David Ortiz.

However, there might be more relevant information. In the 1980s, Bill James, who most consider the founder of sabermetrics, emphasized the importance of age. Baseball players, James found, peaked early—at around the age of twenty-seven. Teams tended to ignore just how much players decline as they age. They overpaid for aging players.

By this more advanced methodology, Boston should definitely have cut David Ortiz.

But this age adjustment might miss something. Not all players follow the same path through life. Some players might peak at twenty-three, others at thirty-two. Short players may age differently from tall players, fat players from skinny players. Baseball statisticians found that there were types of players, each following a different aging path. This story was even worse for Ortiz: “beefy sluggers” indeed do, on average, peak early and collapse shortly past thirty.

Seth Stephens-Davidowitz's books