Data science makes many parts of Freud falsifiable—it puts many of his famous theories to the test. Let’s start with phallic symbols in dreams. Using a huge dataset of recorded dreams, we can readily note how frequently phallic-shaped objects appear. Food is a good place to focus this study. It shows up in many dreams, and many foods are shaped like phalluses—bananas, cucumbers, hot dogs, etc. We can then measure the factors that might make us dream more about certain foods than others—how frequently they are eaten, how tasty most people find them, and, yes, whether they are phallic in nature.
We can test whether two foods, both of which are equally popular, but one of which is shaped like a phallus, appear in dreams in different amounts. If phallus-shaped foods are no more likely to be dreamed about than other foods, then phallic symbols are not a significant factor in our dreams. Thanks to Big Data, this part of Freud’s theory may indeed be falsifiable.
I received data from Shadow, an app that asks users to record their dreams. I coded the foods included in tens of thousands of dreams.
Overall, what makes us dream of foods? The main predictor is how frequently we consume them. The substance that is most dreamed about is water. The top twenty foods include chicken, bread, sandwiches, and rice—all notably un-Freudian.
The second predictor of how frequently a food appears in dreams is how tasty people find it. The two foods we dream about most often are the notably un-Freudian but famously tasty chocolate and pizza.
So what about phallic-shaped foods? Do they sneak into our dreams with unexpected frequency? Nope.
Bananas are the second most common fruit to appear in dreams. But they are also the second most commonly consumed fruit. So we don’t need Freud to explain how often we dream about bananas. Cucumbers are the seventh most common vegetable to appear in dreams. They are the seventh most consumed vegetable. So again their shape isn’t necessary to explain their presence in our minds as we sleep. Hot dogs are dreamed of far less frequently than hamburgers. This is true even controlling for the fact that people eat more burgers than dogs.
Overall, using a regression analysis (a method that allows social scientists to tease apart the impact of multiple factors) across all fruits and vegetables, I found that a food’s being shaped like a phallus did not give it more likelihood of appearing in dreams than would be expected by its popularity. This theory of Freud’s is falsifiable—and, at least according to my look at the data, false.
Next, consider Freudian slips. The psychologist hypothesized that we use our errors—the ways we misspeak or miswrite—to reveal our subconscious desires, frequently sexual. Can we use Big Data to test this? Here’s one way: see if our errors—our slips—lean in the direction of the naughty. If our buried sexual desires sneak out in our slips, there should be a disproportionate number of errors that include words like “penis,” “cock,” and “sex.”
This is why I studied a dataset of more than 40,000 typing errors collected by Microsoft researchers. The dataset included mistakes that people make but then immediately correct. In these tens of thousands of errors, there were plenty of individuals committing errors of a sexual sort. There was the aforementioned “penistrian.” There was also someone who typed “sexurity” instead of “security” and “cocks” instead of “rocks.” But there were also plenty of innocent slips. People wrote of “pindows” and “fegetables,” “aftermoons” and “refriderators.”
So was the number of sexual slips unusual?
To test this, I first used the Microsoft dataset to model how frequently people mistakenly switch particular letters. I calculated how often they replace a t with an s, a g with an h. I then created a computer program that made mistakes in the way that people do. We might call it Error Bot. Error Bot replaced a t with an s with the same frequency that humans in the Microsoft study did. It replaced a g with an h as often as they did. And so on. I ran the program on the same words people had gotten wrong in the Microsoft study. In other words, the bot tried to spell “pedestrian” and “rocks,” “windows” and “refrigerator.” But it switched an r with a t as often as people do and wrote, for example, “tocks.” It switched an r with a c as often as humans do and wrote “cocks.”
So what do we learn from comparing Error Bot with normally careless humans? After making a few million errors, just from misplacing letters in the ways that humans do, Error Bot had made numerous mistakes of a Freudian nature. It misspelled “seashell” as “sexshell,” “lipstick” as “lipsdick,” and “luckiest” as “fuckiest,” along with many other similar mistakes. And—here’s the key point—Error Bot, which of course does not have a subconscious, was just as likely to make errors that could be perceived as sexual as real people were. With the caveat, as we social scientists like to say, that there needs to be more research, this means that sexually oriented errors are no more likely for humans to make than can be expected by chance.
In other words, for people to make errors such as “penistrian,” “sexurity,” and “cocks,” it is not necessary to have some connection between mistakes and the forbidden, some theory of the mind where people reveal their secret desires via their errors. These slips of the fingers can be explained entirely by the typical frequency of typos. People make lots of mistakes. And if you make enough mistakes, eventually you start saying things like “lipsdick,” “fuckiest,” and “penistrian.” If a monkey types long enough, he will eventually write “to be or not to be.” If a person types long enough, she will eventually write “penistrian.”
Freud’s theory that errors reveal our subconscious wants is indeed falsifiable—and, according to my analysis of the data, false.
Big Data tells us a banana is always just a banana and a “penistrian” just a misspelled “pedestrian.”
So was Freud totally off-target in all his theories? Not quite. When I first got access to PornHub data, I found a revelation there that struck me as at least somewhat Freudian. In fact, this is among the most surprising things I have found yet during my data investigations: a shocking number of people visiting mainstream porn sites are looking for portrayals of incest.
Of the top hundred searches by men on PornHub, one of the most popular porn sites, sixteen are looking for incest-themed videos. Fair warning—this is going to get a little graphic: they include “brother and sister,” “step mom fucks son,” “mom and son,” “mom fucks son,” and “real brother and sister.” The plurality of male incestuous searches are for scenes featuring mothers and sons. And women? Nine of the top hundred searches by women on PornHub are for incest-themed videos, and they feature similar imagery—though with the gender of any parent and child who is mentioned usually reversed. Thus the plurality of incestuous searches made by women are for scenes featuring fathers and daughters.