If Boston considered his recent past, his age, and his size, they should, without a doubt, have cut David Ortiz.
Then, in 2003, statistician Nate Silver introduced a new model, which he called PECOTA, to predict player performance. It proved to be the best—and, also, the coolest. Silver searched for players’ doppelgangers. Here’s how it works. Build a database of every Major League Baseball player ever, more than 18,000 men. And include everything you know about those players: their height, age, and position; their home runs, batting average, walks, and strikeouts for each year of their careers. Now, find the twenty ballplayers who look most similar to Ortiz right up until that point in his career—those who played like he did when he was 24, 25, 26, 27, 28, 29, 30, 31, 32, and 33. In other words, find his doppelgangers. Then see how Ortiz’s doppelgangers’ careers progressed.
A doppelganger search is another example of zooming in. It zooms in on the small subset of people most similar to a given person. And, as with all zooming in, it gets better the more data you have. It turns out, Ortiz’s doppelgangers gave a very different prediction for Ortiz’s future. Ortiz’s doppelgangers included Jorge Posada and Jim Thome. These players started their careers a bit slow; had amazing bursts in their late twenties, with world-class power; and then struggled in their early thirties.
Silver then predicted how Ortiz would do based on how these doppelgangers ended up doing. And here’s what he found: they regained their power. For trophy wives, Simmons may be right: when it goes, it goes. But for Ortiz’s doppelgangers, when it went, it came back.
The doppelganger search, the best methodology ever used to predict baseball player performance, said Boston should be patient with Ortiz. And Boston indeed was patient with their aging slugger. In 2010, Ortiz’s average rose to .270. He hit 32 home runs and made the All-Star team. This began a string of four consecutive All-Star games for Ortiz. In 2013, batting in his traditional third spot in the lineup, at the age of thirty-seven, Ortiz batted .688 as Boston defeated St. Louis, 4 games to 2, in the World Series. Ortiz was voted World Series MVP.*
As soon as I finished reading Nate Silver’s approach to predicting the trajectory of ballplayers, I immediately began thinking about whether I might have a doppelganger, too.
Doppelganger searches are promising in many fields, not just athletics. Could I find the person who shares the most interests with me? Maybe if I found the person most similar to me, we could hang out. Maybe he would know some restaurants we would like. Maybe he could introduce me to things I had no idea I might have an affinity for.
A doppelganger search zooms in on individuals and even on the traits of individuals. And, as with all zooming in, it gets sharper the more data you have. Suppose I searched for my doppelganger in a dataset of ten or so people. I might find someone who shared my interest in books. Suppose I searched for my doppelganger in a dataset of a thousand or so people. I might find someone who had a thing for popular physics books. But suppose I searched for my doppelganger in a dataset of hundreds of millions of people. Then I might be able to find someone who was really, truly similar to me.
One day, I went doppelganger hunting on social media. Using the entire corpus of Twitter profiles, I looked for the people on the planet who have the most common interests with me.
You can certainly tell a lot about my interests from whom I follow on my Twitter account. Overall, I follow some 250 people, showing my passions for sports, politics, comedy, science, and morose Jewish folksingers.
So is there anybody out there in the universe who follows all 250 of these accounts, my Twitter twin? Of course not. Doppelgangers aren’t identical to us, only similar. Nor is there anybody who follows 200 of the accounts I follow. Or even 150.
However, I did eventually find an account that followed an amazing 100 of the accounts I follow: Country Music Radio Today. Huh? It turns out, Country Music Radio Today was a bot (it no longer exists) that followed 750,000 Twitter profiles in the hope that they would follow back.
I have an ex-girlfriend who I suspect would get a kick out of this result. She once told me I was more like a robot than a human being.
All joking aside, my initial finding that my doppelganger was a bot that followed 750,000 random accounts does make an important point about doppelganger searches. For a doppelganger search to be truly accurate, you don’t want to find someone who merely likes the same things you like. You also want to find someone who dislikes the things you dislike.
My interests are apparent not just from the accounts I follow but from those I choose not to follow. I am interested in sports, politics, comedy, and science but not food, fashion, or theater. My follows show that I like Bernie Sanders but not Elizabeth Warren, Sarah Silverman but not Amy Schumer, the New Yorker but not the Atlantic, my friends Noah Popp, Emily Sands, and Josh Gottlieb but not my friend Sam Asher. (Sorry, Sam. But your Twitter feed is a snooze.)
Of all 200 million people on Twitter, who has the most similar profile to me? It turns out my doppelganger is Vox writer Dylan Matthews. This was kind of a letdown, for the purposes of improving my media consumption, as I already follow Matthews on Twitter and Facebook and compulsively read his Vox posts. So learning he was my doppelganger hasn’t really changed my life. But it’s still pretty cool to know the person most similar to you in the world, especially if it’s someone you admire. And when I finish this book and stop being a hermit, maybe Matthews and I can hang out and discuss the writings of James Surowiecki.
The Ortiz doppelganger search was neat for baseball fans. And my doppelganger search was entertaining, at least to me. But what else can these searches reveal? For one thing, doppelganger searches have been used by many of the biggest internet companies to dramatically improve their offerings and user experience. Amazon uses something like a doppelganger search to suggest what books you might like. They see what people similar to you select and base their recommendations on that.
Pandora does the same in picking what songs you might want to listen to. And this is how Netflix figures out the movies you might like. The impact has been so profound that when Amazon engineer Greg Linden originally introduced doppelganger searches to predict readers’ book preferences, the improvement in recommendations was so good that Amazon founder Jeff Bezos got to his knees and shouted, “I’m not worthy!” to Linden.