Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Reisinger founded a company, Premise, which employs a group of workers in developing countries, armed with smartphones. The employees’ job? To take pictures of interesting goings-on that might have economic import.

The employees might get snapshots outside gas stations or of fruit bins in supermarkets. They take pictures of the same locations over and over again. The pictures are sent back to Premise, whose second group of employees—computer scientists—turn the photos into data. The company’s analysts can code everything from the length of lines in gas stations to how many apples are available in a supermarket to the ripeness of these apples to the price listed on the apples’ bin. Based on photographs of all sorts of activity, Premise can begin to put together estimates of economic output and inflation. In developing countries, long lines in gas stations are a leading indicator of economic trouble. So are unavailable or unripe apples. Premise’s on-the-ground pictures of China helped them discover food inflation there in 2011 and food deflation in 2012, long before the official data came in.

Premise sells this information to banks or hedge funds and also collaborates with the World Bank.

Like many good ideas, Premise’s is a gift that keeps on giving. The World Bank was recently interested in the size of the underground cigarette economy in the Philippines. In particular, they wanted to know the effects of the government’s recent efforts, which included random raids, to crack down on manufacturers that produced cigarettes without paying a tax. Premise’s clever idea? Take photos of cigarette boxes seen on the street. See how many of them have tax stamps, which all legitimate cigarettes do. They have found that this part of the underground economy, while large in 2015, got significantly smaller in 2016. The government’s efforts worked, although seeing something usually so hidden—illegal cigarettes—required new data.


As we’ve seen, what constitutes data has been wildly reimagined in the digital age and a lot of insights have been found in this new information. Learning what drives media bias, what makes a good first date, and how developing economies are really doing is just the beginning.

Not incidentally, a lot of money has also been made from such new data, starting with Messrs. Brin’s and Page’s tens of billions. Joseph Reisinger hasn’t done badly himself. Observers estimate that Premise is now making tens of millions of dollars in annual revenue. Investors recently poured $50 million into the company. This means some investors consider Premise among the most valuable enterprises in the world primarily in the business of taking and selling photos, in the same league as Playboy.

There is, in other words, outsize value, for scholars and entrepreneurs alike, in utilizing all the new types of data now available, in thinking broadly about what counts as data. These days, a data scientist must not limit herself to a narrow or traditional view of data. These days, photographs of supermarket lines are valuable data. The fullness of supermarket bins is data. The ripeness of apples is data. Photos from outer space are data. The curvature of lips is data. Everything is data!

And with all this new data, we can finally see through people’s lies.





4



DIGITAL TRUTH SERUM

Everybody lies.

People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when they’re not. They say they’ll be in touch when they won’t. They say it’s not about you when it is. They say they love you when they don’t. They say they’re happy while in the dumps. They say they like women when they really like men.

People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves.

And they damn sure lie to surveys.

Here’s my brief survey for you:

Have you ever cheated on an exam? __________

Have you ever fantasized about killing someone? _________

Were you tempted to lie? Many people underreport embarrassing behaviors and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias.

An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behavior, and charitable giving.



Has anything changed in sixty-five years? In the age of the internet, not owning a library card is no longer embarrassing. But, while what’s embarrassing or desirable may have changed, people’s tendency to deceive pollsters remains strong.

A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared to official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2 percent reported that they graduated with lower than a 2.5 GPA. (In reality, about 11 percent did.) And 44 percent said they had donated to the university in the past year. (In reality, about 28 percent did.)

And it is certainly possible that lying played a role in the failure of the polls to predict Donald Trump’s 2016 victory. Polls, on average, underestimated his support by about 2 percentage points. Some people may have been embarrassed to say they were planning to support him. Some may have claimed they were undecided when they were really going Trump’s way all along.

Why do people misinform anonymous surveys? I asked Roger Tourangeau, a research professor emeritus at the University of Michigan and perhaps the world’s foremost expert on social desirability bias. Our weakness for “white lies” is an important part of the problem, he explained. “About one-third of the time, people lie in real life,” he suggests. “The habits carry over to surveys.”

Then there’s that odd habit we sometimes have of lying to ourselves. “There is an unwillingness to admit to yourself that, say, you were a screw-up as a student,” says Tourangeau.

Seth Stephens-Davidowitz's books