Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

How, then, can we more accurately establish causality? The gold standard is a randomized, controlled experiment. Here’s how it works. You randomly divide people into two groups. One, the treatment group, is asked to do or take something. The other, the control group, is not. You then see how each group responds. The difference in the outcomes between the two groups is your causal effect.

For example, to test whether moderate drinking causes good health, you might randomly pick some people to drink one glass of wine per day for a year, randomly choose others to drink no alcohol for a year, and then compare the reported health of both groups. Since people were randomly assigned to the two groups, there is no reason to expect one group would have better initial health or have socialized more. You can trust that the effects of the wine are causal. Randomized, controlled experiments are the most trusted evidence in any field. If a pill can pass a randomized, controlled experiment, it can be dispensed to the general populace. If it cannot pass this test, it won’t make it onto pharmacy shelves.

Randomized experiments have increasingly been used in the social sciences as well. Esther Duflo, a French economist at MIT, has led the campaign for greater use of experiments in developmental economics, a field that tries to figure out the best ways to help the poorest people in the world. Consider Duflo’s study, with colleagues, of how to improve education in rural India, where more than half of middle school students cannot read a simple sentence. One potential reason students struggle so much is that teachers don’t show up consistently. On a given day in some schools in rural India, more than 40 percent of teachers are absent.

Duflo’s test? She and her colleagues randomly divided schools into two groups. In one (the treatment group), in addition to their base pay, teachers were paid a small amount—50 rupees, or about $1.15—for every day they showed up to work. In the other, no extra payment for attendance was given. The results were remarkable. When teachers were paid, teacher absenteeism dropped in half. Student test performance also improved substantially, with the biggest effects on young girls. By the end of the experiment, girls in schools where teachers were paid to come to class were 7 percentage points more likely to be able to write.

According to a New Yorker article, when Bill Gates learned of Duflo’s work, he was so impressed he told her, “We need to fund you.”





THE ABCS OF A/B TESTING




So randomized experiments are the gold standard for proving causality, and their use has spread through the social sciences. Which brings us back to Google’s offices on February 27, 2000. What did Google do on that day that revolutionized the internet?

On that day, a few engineers decided to perform an experiment on Google’s site. They randomly divided users into two groups. The treatment group was shown twenty links on the search results pages. The control group was shown the usual ten. The engineers then compared the satisfaction of the two groups based on how frequently they returned to Google.

This is a revolution? It doesn’t seem so revolutionary. I already noted that randomized experiments have been used by pharmaceutical companies and social scientists. How can copying them be such a big deal?

The key point—and this was quickly realized by the Google engineers—is that experiments in the digital world have a huge advantage relative to experiments in the offline world. As convincing as offline randomized experiments can be, they are also resource-intensive. For Duflo’s study, schools had to be contacted, funding had to be arranged, some teachers had to be paid, and all students had to be tested. Offline experiments can cost thousands or hundreds of thousands of dollars and take months or years to conduct.

In the digital world, randomized experiments can be cheap and fast. You don’t need to recruit and pay participants. Instead, you can write a line of code to randomly assign them to a group. You don’t need users to fill out surveys. Instead, you can measure mouse movements and clicks. You don’t need to hand-code and analyze the responses. You can build a program to automatically do that for you. You don’t have to contact anybody. You don’t even have to tell users they are part of an experiment.

This is the fourth power of Big Data: it makes randomized experiments, which can find truly causal effects, much, much easier to conduct—anytime, more or less anywhere, as long as you’re online. In the era of Big Data all the world’s a lab.

This insight quickly spread through Google and then the rest of Silicon Valley, where randomized controlled experiments have been renamed “A/B testing.” In 2011, Google engineers ran seven thousand A/B tests. And this number is only rising.

If Google wants to know how to get more people to click on ads on their sites, they may try two shades of blue in ads—one shade for Group A, another for Group B. Google can then compare click rates. Of course, the ease of such testing can lead to overuse. Some employees felt that because testing was so effortless, Google was overexperimenting. In 2009, one frustrated designer quit after Google went through forty-one marginally different shades of blue in A/B testing. But this designer’s stand in favor of art over obsessive market research has done little to stop the spread of the methodology.

Facebook now runs a thousand A/B tests per day, which means that a small number of engineers at Facebook start more randomized, controlled experiments in a given day than the entire pharmaceutical industry starts in a year.

A/B testing has spread beyond the biggest tech firms. A former Google employee, Dan Siroker, brought this methodology to Barack Obama’s first presidential campaign, which A/B-tested home page designs, email pitches, and donation forms. Then Siroker started a new company, Optimizely, which allows organizations to perform rapid A/B testing. In 2012, Optimizely was used by Obama as well as his opponent, Mitt Romney, to maximize sign-ups, volunteers, and donations. It’s also used by companies as diverse as Netflix, TaskRabbit, and New York magazine.

To see how valuable testing can be, consider how Obama used it to get more people engaged with his campaign. Obama’s home page initially included a picture of the candidate and a button below the picture that invited people to “Sign Up.”



Was this the best way to greet people? With the help of Siroker, Obama’s team could test whether a different picture and button might get more people to actually sign up. Would more people click if the home page instead featured a picture of Obama with a more solemn face? Would more people click if the button instead said “Join Now”? Obama’s team showed users different combinations of pictures and buttons and measured how many of them clicked the button. See if you can predict the winning picture and winning button.

Pictures Tested



Buttons Tested



The winner was the picture of Obama’s family and the button “Learn More.” And the victory was huge. By using that combination, Obama’s campaign team estimated it got 40 percent more people to sign up, netting the campaign roughly $60 million in additional funding.

Winning Combination

Seth Stephens-Davidowitz's books