Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Seth Stephens-Davidowitz



FOREWORD



Ever since philosophers speculated about a “cerebroscope,” a mythical device that would display a person’s thoughts on a screen, social scientists have been looking for tools to expose the workings of human nature. During my career as an experimental psychologist, different ones have gone in and out of fashion, and I’ve tried them all—rating scales, reaction times, pupil dilation, functional neuroimaging, even epilepsy patients with implanted electrodes who were happy to while away the hours in a language experiment while waiting to have a seizure.

Yet none of these methods provides an unobstructed view into the mind. The problem is a savage tradeoff. Human thoughts are complex propositions; unlike Woody Allen speed-reading War and Peace, we don’t just think “It was about some Russians.” But propositions in all their tangled multidimensional glory are difficult for a scientist to analyze. Sure, when people pour their hearts out, we apprehend the richness of their stream of consciousness, but monologues are not an ideal dataset for testing hypotheses. On the other hand, if we concentrate on measures that are easily quantifiable, like people’s reaction time to words, or their skin response to pictures, we can do the statistics, but we’ve pureed the complex texture of cognition into a single number. Even the most sophisticated neuroimaging methodologies can tell us how a thought is splayed out in 3-D space, but not what the thought consists of.

As if the tradeoff between tractability and richness weren’t bad enough, scientists of human nature are vexed by the Law of Small Numbers—Amos Tversky and Daniel Kahneman’s name for the fallacy of thinking that the traits of a population will be reflected in any sample, no matter how small. Even the most numerate scientists have woefully defective intuitions about how many subjects one really needs in a study before one can abstract away from the random quirks and bumps and generalize to all Americans, to say nothing of Homo sapiens. It’s all the iffier when the sample is gathered by convenience, such as by offering beer money to the sophomores in our courses.

This book is about a whole new way of studying the mind. Big Data from internet searches and other online responses are not a cerebroscope, but Seth Stephens-Davidowitz shows that they offer an unprecedented peek into people’s psyches. At the privacy of their keyboards, people confess the strangest things, sometimes (as in dating sites or searches for professional advice) because they have real-life consequences, at other times precisely because they don’t have consequences: people can unburden themselves of some wish or fear without a real person reacting in dismay or worse. Either way, the people are not just pressing a button or turning a knob, but keying in any of trillions of sequences of characters to spell out their thoughts in all their explosive, combinatorial vastness. Better still, they lay down these digital traces in a form that is easy to aggregate and analyze. They come from all walks of life. They can take part in unobtrusive experiments which vary the stimuli and tabulate the responses in real time. And they happily supply these data in gargantuan numbers.

Everybody Lies is more than a proof of concept. Time and again my preconceptions about my country and my species were turned upside-down by Stephens-Davidowitz’s discoveries. Where did Donald Trump’s unexpected support come from? When Ann Landers asked her readers in 1976 whether they regretted having children and was shocked to find that a majority did, was she misled by an unrepresentative, self-selected sample? Is the internet to blame for that redundantly named crisis of the late 2010s, the “filter bubble”? What triggers hate crimes? Do people seek jokes to cheer themselves up? And though I like to think that nothing can shock me, I was shocked aplenty by what the internet reveals about human sexuality—including the discovery that every month a certain number of women search for “humping stuffed animals.” No experiment using reaction time or pupil dilation or functional neuroimaging could ever have turned up that fact.

Everybody will enjoy Everybody Lies. With unflagging curiosity and an endearing wit, Stephens-Davidowitz points to a new path for social science in the twenty-first century. With this endlessly fascinating window into human obsessions, who needs a cerebroscope?

—Steven Pinker, 2017





INTRODUCTION



THE OUTLINES OF A REVOLUTION

Surely he would lose, they said.

In the 2016 Republican primaries, polling experts concluded that Donald Trump didn’t stand a chance. After all, Trump had insulted a variety of minority groups. The polls and their interpreters told us few Americans approved of such outrages.

Most polling experts at the time thought that Trump would lose in the general election. Too many likely voters said they were put off by his manner and views.

But there were actually some clues that Trump might actually win both the primaries and the general election—on the internet.


I am an internet data expert. Every day, I track the digital trails that people leave as they make their way across the web. From the buttons or keys we click or tap, I try to understand what we really want, what we will really do, and who we really are. Let me explain how I got started on this unusual path.

The story begins—and this seems like ages ago—with the 2008 presidential election and a long-debated question in social science: How significant is racial prejudice in America?

Barack Obama was running as the first African-American presidential nominee of a major party. He won—rather easily. And the polls suggested that race was not a factor in how Americans voted. Gallup, for example, conducted numerous polls before and after Obama’s first election. Their conclusion? American voters largely did not care that Barack Obama was black. Shortly after the election, two well-known professors at the University of California, Berkeley pored through other survey-based data, using more sophisticated data-mining techniques. They reached a similar conclusion.

And so, during Obama’s presidency, this became the conventional wisdom in many parts of the media and in large swaths of the academy. The sources that the media and social scientists have used for eighty-plus years to understand the world told us that the overwhelming majority of Americans did not care that Obama was black when judging whether he should be their president.

Seth Stephens-Davidowitz's books