Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

We can do this for migraines. We can do this for kidney stones. We can do this for anxiety and depression and Alzheimer’s and pancreatic cancer and high blood pressure and back pain and constipation and nosebleeds. We can do this for everything. The analysis that Snow did once, we might be able to do four hundred times (something as of this writing I am already starting to work on).

We might call this—taking a simple method and utilizing Big Data to perform an analysis several hundred times in a short period of time—science at scale. Yes, the social and behavioral sciences are most definitely going to scale. Zooming in on health conditions will help these sciences scale. Another thing that will help them scale: A/B testing. We discussed A/B testing in the context of businesses getting users to click on headlines and ads—and this has been the predominant use of the methodology. But A/B testing can be used to uncover things more fundamental—and socially valuable—than an arrow that gets people to click on an ad.

Benjamin F. Jones is an economist at Northwestern who is trying to use A/B testing to better help kids learn. He has helped create a platform, EDU STAR, which allows for schools to randomly test different lesson plans.

Many companies are in the education software business. With EDU STAR, students log in to a computer and are randomly exposed to different lesson plans. Then they take short tests to see how well they learned the material. Schools, in other words, learn what software works best for helping students grasp material.

Already, like all great A/B testing platforms, EDU STAR is yielding surprising results. One lesson plan that many educators were very excited about included software that utilized games to help teach students fractions. Certainly, if you turned math into a game, students would have more fun, learn more, and do better on tests. Right? Wrong. Students who were taught fractions via a game tested worse than those who learned fractions in a more standard way.

Getting kids to learn more is an exciting, and socially beneficial, use of the testing that Silicon Valley pioneered to get people to click on more ads. So is getting people to sleep more.

The average American gets 6.7 hours of sleep every night. Most Americans want to sleep more. But 11 P.M. rolls around, and SportsCenter is on or YouTube is calling. So shut-eye waits. Jawbone, a wearable-device company with hundreds of thousands of customers, performs thousands of tests to try to find interventions that help get their users to do what they want to do: go to bed earlier.

Jawbone scored a huge win using a two-pronged goal. First, ask customers to commit to a not-that-ambitious goal. Send them a message like this: “It looks like you haven’t been sleeping much in the last 3 days. Why don’t you aim to get to bed by 11:30 tonight? We know you normally get up at 8 A.M.” Then the users will have an option to click on “I’m in.”

Second, when 10:30 comes, Jawbone will send another message: “We decided you’d aim to sleep at 11:30. It’s 10:30 now. Why not start now?”

Jawbone found this strategy led to twenty-three minutes of extra sleep. They didn’t get customers to actually get to bed at 10:30, but they did get them to bed earlier.

Of course, every part of this strategy had to be optimized through lots of experimentation. Start the original goal too early—ask users to commit to going to bed by 11 P.M.—and few will play along. Ask users to go to bed by midnight and little will be gained.

Jawbone used A/B testing to find the sleep equivalent of Google’s right-pointing arrow. But instead of getting a few more clicks for Google’s ad partners, it yields a few more minutes of rest for exhausted Americans.

In fact, the whole field of psychology might utilize the tools of Silicon Valley to dramatically improve their research. I’m eagerly anticipating the first psychology paper that, instead of detailing a couple of experiments done with a few undergrads, shows the results of a thousand rapid A/B tests.

The days of academics devoting months to recruiting a small number of undergrads to perform a single test will come to an end. Instead, academics will utilize digital data to test a few hundred or a few thousand ideas in just a few seconds. We’ll be able to learn a lot more in a lot less time.

Text as data is going to teach us a lot more. How do ideas spread? How do new words form? How do words disappear? How do jokes form? Why are certain words funny and others not? How do dialects develop? I bet, within twenty years, we will have profound insights on all these questions.

I think we might consider utilizing kids’ online behavior—appropriately anonymized—as a supplement to traditional tests to see how they are learning and developing. How is their spelling? Are they showing signs of dyslexia? Are they developing mature, intellectual interests? Do they have friends? There are clues to all these questions in the thousands of keystrokes every child makes every day.

And there is another, not-trivial area, where plenty more insights are coming.

In the song “Shattered,” by the Rolling Stones, Mick Jagger describes all that makes New York City, the Big Apple, so magical. Laughter. Joy. Loneliness. Rats. Bedbugs. Pride. Greed. People dressed in paper bags. But Jagger devotes the most words for what makes the city truly special: “sex and sex and sex and sex.”

As with the Big Apple, so with Big Data. Thanks to the digital revolution, insights are coming in health. Sleep. Learning. Psychology. Language. Plus, sex and sex and sex and sex.

One question I am currently exploring: how many dimensions of sexuality are there? We usually think of someone as gay or straight. But sexuality is clearly more complex than that. Among gay people and straight people, people have types—some men like “blondes,” others “brunettes,” for instance. Might these preferences be as strong as the preferences for gender? Another question I am looking into: where do sexual preferences come from? Just as we can figure out the key years that determine baseball fandom or political views, we can now find the key years that determine adult sexual preferences. To learn these answers, you will have to buy my next book, tentatively titled Everybody (Still) Lies.

The existence of porn—and the data that comes with it—is a revolutionary development in the science of human sexuality.

It took time for the natural sciences to begin changing our lives—to create penicillin, satellites, and computers. It may take time before Big Data leads the social and behavioral sciences to important advances in the way we love, learn, and live. But I believe such advances are coming. I hope you see at least the outlines of such developments from this book. I hope, in fact, that some of you reading this book help create such advances.


Seth Stephens-Davidowitz's books