Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

So what can you learn when you code the mood of text? Facebook data scientists have shown one exciting possibility. They can estimate a country’s Gross National Happiness every day. If people’s status messages tend to be positive, the country is assumed happy for the day. If they tend to be negative, the country is assumed sad for the day.

Among the Facebook data scientists’ findings: Christmas is one of the happiest days of the year. Now, I was skeptical of this analysis—and am a bit skeptical of this whole project. Generally, I think many people are secretly sad on Christmas because they are lonely or fighting with their family. More generally, I tend not to trust Facebook status updates, for reasons that I will discuss in the next chapter—namely, our propensity to lie about our lives on social media.

If you are alone and miserable on Christmas, do you really want to bother all of your friends by posting about how unhappy you are? I suspect there are many people spending a joyless Christmas who still post on Facebook about how grateful they are for their “wonderful, awesome, amazing, happy life.” They then get coded as substantially raising America’s Gross National Happiness. If we are going to really code Gross National Happiness, we should use more sources than just Facebook status updates.

That said, the finding that Christmas is, on balance, a joyous occasion does seem legitimately to be true. Google searches for depression and Gallup surveys also tell us that Christmas is among the happiest days of the year. And, contrary to an urban myth, suicides drop around the holidays. Even if there are some sad and lonely people on Christmas, there are many more merry ones.

These days, when people sit down to read, most of the time it is to peruse status updates on Facebook. But, once upon a time, not so long ago, human beings read stories, sometimes in books. Sentiment analysis can teach us a lot here, too.

A team of scientists, led by Andy Reagan, now at the University of California at Berkeley School of Information, downloaded the text of thousands of books and movie scripts. They could then code how happy or sad each point of the story was.

Consider, for example, the book Harry Potter and the Deathly Hallows. Here, from that team of scientists, is how the mood of the story changes, along with a description of key plot points.



Note that the many rises and falls in mood that the sentiment analysis detects correspond to key events.

Most stories have simpler structures. Take, for example, Shakespeare’s tragedy King John. In this play, nothing goes right. King John of England is asked to renounce his throne. He is excommunicated for disobeying the pope. War breaks out. His nephew dies, perhaps by suicide. Other people die. Finally, John is poisoned by a disgruntled monk.

And here is the sentiment analysis as the play progresses.



In other words, just from the words, the computer was able to detect that things go from bad to worse to worst.

Or consider the movie 127 Hours. A basic plot summary of this movie is as follows:

A mountaineer goes to Utah’s Canyonlands National Park to hike. He befriends other hikers but then parts ways with them. Suddenly, he slips and knocks loose a boulder, which traps his hand and wrist. He attempts various escapes, but each one fails. He becomes depressed. Finally, he amputates his arm and escapes. He gets married, starts a family, and continues climbing, although now he makes sure to leave a note whenever he goes off.

And here is the sentiment analysis as the movie progresses, again by Reagan’s team of scientists.



So what do we learn from the mood of thousands of these stories?

The computer scientists found that a huge percentage of stories fit into one of six relatively simple structures. They are, borrowing a chart from Reagan’s team:

Rags to Riches (rise)

Riches to Rags (fall)

Man in a Hole (fall, then rise)

Icarus (rise, then fall)

Cinderella (rise, then fall, then rise)

Oedipus (fall, then rise, then fall)

There might be small twists and turns not captured by this simple scheme. For example, 127 Hours ranks as a Man in a Hole story, even though there are moments along the way down when sentiments temporarily improve. The large, overarching structure of most stories fits into one of the six categories. Harry Potter and the Deathly Hallows is an exception.

There are a lot of additional questions we might answer. For example, how has the structure of stories changed through time? Have stories gotten more complicated through the years? Do cultures differ in the types of stories they tell? What types of stories do people like most? Do different story structures appeal to men and women? What about people in different countries?

Ultimately, text as data may give us unprecedented insights into what audiences actually want, which may be different from what authors or executives think they want. Already there are some clues that point in this direction.

Consider a study by two Wharton School professors, Jonah Berger and Katherine L. Milkman, on what types of stories get shared. They tested whether positive stories or negative stories were more likely to make the New York Times’ most-emailed list. They downloaded every Times article over a three-month period. Using sentiment analysis, the professors coded the mood of articles. Examples of positive stories included “Wide-Eyed New Arrivals Falling in Love with the City” and “Tony Award for Philanthropy.” Stories such as “Web Rumors Tied to Korean Actress’ Suicide” and “Germany: Baby Polar Bear’s Feeder Dies” proved, not surprisingly, to be negative.

The professors also had information about where the story was placed. Was it on the home page? On the top right? The top left? And they had information about when the story came out. Late Tuesday night? Monday morning?

They could compare two articles—one of them positive, one of them negative—that appeared in a similar place on the Times site and came out at a similar time and see which one was more likely to be emailed.

So what gets shared, positive or negative articles?

Positive articles. As the authors conclude, “Content is more likely to become viral the more positive it is.”

Note this would seem to contrast with the conventional journalistic wisdom that people are attracted to violent and catastrophic stories. It may be true that news media give people plenty of dark stories. There is something to the newsroom adage, “If it bleeds, it leads.” The Wharton professors’ study, however, suggests that people may actually want more cheery stories. It may suggest a new adage: “If it smiles, it’s emailed,” though that doesn’t really rhyme.


Seth Stephens-Davidowitz's books