There is another great benefit to the fact that all this gold-standard testing can be done so cheap and easy: it further frees us from our reliance upon our intuition, which, as noted in Chapter 1, has its limitations. A fundamental reason for A/B testing’s importance is that people are unpredictable. Our intuition often fails to predict how they will respond.
Was your intuition correct on Obama’s optimal website?
Here are some more tests for your intuition. The Boston Globe A/B-tests headlines to figure out which ones get the most people to click on a story. Try to guess the winners from these pairs:
Finished your guesses? The answers are in bold below.
I predict you got more than half right, perhaps by considering what you would click on. But you probably did not guess all of these correctly.
Why? What did you miss? What insights into human behavior did you lack? What lessons can you learn from your mistakes?
We usually ask questions such as these after making bad predictions.
But look how difficult it is to draw general conclusions from the Globe headlines. In the first headline test, changing a single word, “this” to “SnotBot,” led to a big win. This might suggest more details win. But in the second headline, “deflated balls,” the detailed term, loses. In the fourth headline, “makes bank” beats the number $179,000. This might suggest slang terms win. But the slang term “hookup contest” loses in the third headline.
The lesson of A/B testing, to a large degree, is to be wary of general lessons. Clark Benson is the CEO of ranker.com, a news and entertainment site that relies heavily on A/B testing to choose headlines and site design. “At the end of the day, you can’t assume anything,” Benson says. “Test literally everything.”
Testing fills in gaps in our understanding of human nature. These gaps will always exist. If we knew, based on our life experience, what the answer would be, testing would not be of value. But we don’t, so it is.
Another reason A/B testing is so important is that seemingly small changes can have big effects. As Benson puts it, “I’m constantly amazed with minor, minor factors having outsized value in testing.”
In December 2012, Google changed its advertisements. They added a rightward-pointing arrow surrounded by a square.
Notice how bizarre this arrow is. It points rightward to absolutely nothing. In fact, when these arrows first appeared, many Google customers were critical. Why were they adding meaningless arrows to the ad, they wondered?
Well, Google is protective of its business secrets, so they don’t say exactly how valuable the arrows were. But they did say that these arrows had won in A/B testing. The reason Google added them is that they got a lot more people to click. And this minor, seemingly meaningless change made Google and their ad partners oodles of money.
So how can you find these small tweaks that produce outsize profits? You have to test lots of things, even many that seem trivial. In fact, Google’s users have noted numerous times that ads have been changed a tiny bit only to return to their previous form. They have unwittingly become members of treatment groups in A/B tests—but at the cost only of seeing these slight variations.
Centering Experiment (Didn’t Work)
Green Star Experiment (Didn’t Work)
New Font Experiment (Didn’t Work)
The above variations never made it to the masses. They lost. But they were part of the process of picking winners. The road to a clickable arrow is paved with ugly stars, faulty positionings, and gimmicky fonts.
It may be fun to guess what makes people click. And if you are a Democrat, it might be nice to know that testing got Obama more money. But there is a dark side to A/B testing.
In his excellent book Irresistible, Adam Alter writes about the rise of behavioral addictions in contemporary society. Many people are finding aspects of the internet increasingly difficult to turn off.
My favorite dataset, Google searches, can give us some clues as to what people find most addictive. According to Google, most addictions remain the ones people have struggled with for many decades—drugs, sex, and alcohol, for example. But the internet is starting to make its presence felt on the list—with “porn” and “Facebook” now among the top ten reported addictions.
TOP ADDICTIONS REPORTED TO GOOGLE, 2016
Drugs
Sex
Porn
Alcohol
Sugar
Love
Gambling
Facebook
A/B testing may be playing a role in making the internet so darn addictive.
Tristan Harris, a “design ethicist,” was quoted in Irresistible explaining why people have such a hard time resisting certain sites on the internet: “There are a thousand people on the other side of the screen whose job it is to break down the self-regulation you have.”
And these people are using A/B testing.
Through testing, Facebook may figure out that making a particular button a particular color gets people to come back to their site more often. So they change the button to that color. Then they may figure out that a particular font gets people to come back to their site more often. So they change the text to that font. Then they may figure out that emailing people at a certain time gets them coming back to their site more often. So they email people at that time.
Pretty soon, Facebook becomes a site optimized to maximize how much time people spend on Facebook. In other words, find enough winners of A/B tests and you have an addictive site. It is the type of feedback that cigarette companies never had.
A/B testing is increasingly a tool of the gaming industry. As Alter discusses, World of Warcraft A/B-tests various versions of its game. One mission might ask you to kill someone. Another might ask you to save something. Game designers can give different samples of players’ different missions and then see which ones keep more people playing. They might find, for example, that the mission that asked you to save a person got people to return 30 percent more often. If they test many, many missions, they start finding more and more winners. These 30 percent wins add up, until they have a game that keeps many adult men holed up in their parents’ basement.
If you are a little disturbed by this, I am with you. And we will talk a bit more about the ethical implications of this and other aspects of Big Data near the end of this book. But for better or worse, experimentation is now a crucial tool in the data scientists’ tool kit. And there is another form of experimentation sitting in that tool kit. It has been used to ask a variety of questions, including whether TV ads really work.
NATURE’S CRUEL—BUT ENLIGHTENING—EXPERIMENTS