Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Are these worries legitimate? It depends. There is strong evidence that pregnant women are at an increased risk of listeria from unpasteurized cheese. Links have been established between drinking too much alcohol and negative outcomes for the child. In some parts of the world, it is believed that drinking cold water can give your baby pneumonia; I don’t know of any medical support for this.

The huge differences in questions posed around the world are most likely caused by the overwhelming flood of information coming from disparate sources in each country: legitimate scientific studies, so-so scientific studies, old wives’ tales, and neighborhood chatter. It is difficult for women to know what to focus on—or what to Google.

We can see another clear difference when we look at the top searches for “how to ___ during pregnancy?” In the United States, Australia, and Canada, the top search is “how to prevent stretch marks during pregnancy.” But in Ghana, India, and Nigeria, preventing stretch marks is not even in the top five. These countries tend to be more concerned with how to have sex or how to sleep.





There is undoubtedly more to learn from zooming in on aspects of health and culture in different corners of the world. But my preliminary analysis suggests that Big Data will tell us that humans are even less powerful than we realized when it comes to transcending our biology. Yet we come up with remarkably different interpretations of what it all means.





HOW WE FILL OUR MINUTES AND HOURS




“The adventures of a young man whose principal interests are rape, ultra-violence, and Beethoven.”

That was how Stanley Kubrick’s controversial A Clockwork Orange was advertised. In the movie, the fictional young protagonist, Alex DeLarge, committed shocking acts of violence with chilling detachment. In one of the film’s most notorious scenes, he raped a woman while belting out “Singin’ in the Rain.”

Almost immediately, there were reports of copycat incidents. Indeed, a group of men raped a seventeen-year-old girl while singing the same song. The movie was shut down in many European countries, and some of the more shocking scenes were removed for a version shown in America.

There are, in fact, many examples of real life imitating art, with men seemingly hypnotized by what they had just seen on-screen. A showing of the gang movie Colors was followed by a violent shooting. A showing of the gang movie New Jack City was followed by riots.

Perhaps most disturbing, four days after the release of The Money Train, men used lighter fluid to ignite a subway toll booth, almost perfectly mimicking a scene in the film. The only difference between the fictional and real-world arson: In the movie, the operator escaped. In real life, he burned to death.

There is also some evidence from psychological experiments that subjects exposed to a violent film will report more anger and hostility, even if they don’t precisely imitate one of the scenes.

In other words, anecdotes and experiments suggest violent movies can incite violent behavior. But how big an effect do they really have? Are we talking about one or two murders every decade or hundreds of murders every year? Anecdotes and experiments can’t answer this.

To see if Big Data could, two economists, Gordon Dahl and Stefano DellaVigna, merged together three Big Datasets for the years 1995 to 2004: FBI hourly crime data, box-office numbers, and a measure of the violence in every movie from kids-in-mind.com.

The information they were using was complete—every movie and every crime committed in every hour in cities throughout the United States. This would prove important.

Key to their study was the fact that on some weekends, the most popular movie was a violent one—Hannibal or Dawn of the Dead, for example—while on other weekends, the most popular movie was nonviolent, such as Runaway Bride or Toy Story.

The economists could see exactly how many murders, rapes, and assaults were committed on weekends when a prominent violent movie was released and compare that to the number of murders, rapes, and assaults there were on weekends when a prominent peaceful movie was released.

So what did they find? When a violent movie was shown, did crime rise, as some experiments suggest? Or did it stay the same?

On weekends with a popular violent movie, the economists found, crime dropped.

You read that right. On weekends with a popular violent movie, when millions of Americans were exposed to images of men killing other men, crime dropped—significantly.

When you get a result this strange and unexpected, your first thought is that you’ve done something wrong. Each author carefully went over the coding. No mistakes. Your second thought is that there is some other variable that will explain these results. They checked if time of year affected the results. It didn’t. They collected data on weather, thinking perhaps somehow this was driving the relationship. It wasn’t.

“We checked all our assumptions, everything we were doing,” Dahl told me. “We couldn’t find anything wrong.”

Despite the anecdotes, despite the lab evidence, and as bizarre as it seemed, showing a violent movie somehow caused a big drop in crime. How could this possibly be?

The key to figuring it out for Dahl and DellaVigna was utilizing their Big Data to zoom in closer. Survey data traditionally provided information that was annual or at best perhaps monthly. If we are really lucky, we might get data for a weekend. By comparison, as we’ve increasingly been using comprehensive datasets, rather than small-sample surveys, we have been able to home in by the hour and even the minute. This has allowed us to learn a lot more about human behavior.

Sometimes fluctuations over time are amusing, if not earth-shattering. EPCOR, a utility company in Edmonton, Canada, reported minute-by-minute water consumption data during the 2010 Olympic gold medal hockey match between the United States and Canada, which an estimated 80 percent of Canadians watched. The data tells us that shortly after each period ended, water consumption shot up. Toilets across Edmonton were clearly flushing.

Google searches can also be broken down by the minute, revealing some interesting patterns in the process. For example, searches for “unblocked games” soar at 8 A.M. on weekdays and stay high through 3 P.M., no doubt in response to schools’ attempts to block access to mobile games on school property without banning students’ cell phones.



Search rates for “weather,” “prayer,” and “news” peak before 5:30 A.M., evidence that most people wake up far earlier than I do. Search rates for “suicide” peak at 12:36 A.M. and are at the lowest levels around 9 A.M., evidence that most people are far less miserable in the morning than I am.

Seth Stephens-Davidowitz's books