Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Data size was also crucial for my baseball study. I needed to zoom in not only on fans of each team but on people of every age. Millions of observations are required to do this and Facebook and other digital sources routinely offer such numbers.

This is where the bigness of Big Data really comes into play. You need a lot of pixels in a photo in order to be able to zoom in with clarity on one small portion of it. Similarly, you need a lot of observations in a dataset in order to be able to zoom in with clarity on one small subset of that data—for example, how popular the Mets are among men born in 1978. A small survey of a couple of thousand people won’t have a large enough sample of such men.

This is the third power of Big Data: Big Data allows us to meaningfully zoom in on small segments of a dataset to gain new insights on who we are. And we can zoom in on other dimensions besides age. If we have enough data, we can see how people in particular towns and cities behave. And we can see how people carry on hour-by-hour or even minute-by-minute.

In this chapter, human behavior gets its close-up.





WHAT’S REALLY GOING ON IN OUR COUNTIES, CITIES, AND TOWNS?




In hindsight it’s surprising. But when Raj Chetty, then a professor at Harvard, and a small research team first got a hold of a rather large dataset—all Americans’ tax records since 1996—they were not certain anything would come of it. The IRS had handed over the data because they thought the researchers might be able to use it to help clarify the effects of tax policy.

The initial attempts Chetty and his team made to use this Big Data led, in fact, to numerous dead ends. Their investigations of the consequences of state and federal tax policies reached mostly the same conclusions everybody else had just by using surveys. Perhaps Chetty’s answers, using the hundreds of millions of IRS data points, were a bit more precise. But getting the same answers as everybody else, with a little more precision, is not a major social science accomplishment. It is not the type of work that top journals are eager to publish.

Plus, organizing and analyzing all the IRS data was time-consuming. Chetty and his team—drowning in data—were taking more time than everybody else to find the same answers as everybody else.

It was beginning to look like the Big Data skeptics were right. You didn’t need data for hundreds of millions of Americans to understand tax policy; a survey of ten thousand people was plenty. Chetty and his team were understandably discouraged.

And then, finally, the researchers realized their mistake. “Big Data is not just about doing the same thing you would have done with surveys except with more data,” Chetty explains. They were asking little data questions of the massive collection of data they had been handed. “Big Data really should allow you to use completely different designs than what you would have with a survey,” Chetty adds. “You can, for example, zoom in on geographies.”

In other words, with data on hundreds of millions of people, Chetty and his team could spot patterns among cities, towns, and neighborhoods, large and small.

As a graduate student at Harvard, I was in a seminar room when Chetty presented his initial results using the tax records of every American. Social scientists refer in their work to observations—how many data points they have. If a social scientist is working with a survey of eight hundred people, he would say, “We have eight hundred observations.” If he is working with a laboratory experiment with seventy people, he would say, “We have seventy observations.”

“We have one-point-two billion observations,” Chetty said, straight-faced. The audience giggled nervously.

Chetty and his coauthors began, in that seminar room and then in a series of papers, to give us important new insights into how America works.

Consider this question: is America a land of opportunity? Do you have a shot, if your parents are not rich, to become rich yourself?

The traditional way to answer this question is to look at a representative sample of Americans and compare this to similar data from other countries.

Here is the data for a variety of countries on equality of opportunity. The question asked: what is the chance that a person with parents in the bottom 20 percent of the income distribution reaches the top 20 percent of the income distribution?

CHANCES A PERSON WITH POOR PARENTS WILL BECOME RICH (SELECTED COUNTRIES)



United States





7.5




United Kingdom





9.0




Denmark





11.7




Canada





13.5




As you can see, America does not score well.

But this simple analysis misses the real story. Chetty’s team zoomed in on geography. They found the odds differ a huge amount depending on where in the United States you were born.

CHANCES A PERSON WITH POOR PARENTS WILL BECOME RICH (SELECTED PARTS OF THE UNITED STATES)



San Jose, CA





12.9




Washington, DC





10.5




United States Average





7.5




Chicago, IL





6.5




Charlotte, NC





4.4




In some parts of the United States, the chance of a poor kid succeeding is as high as in any developed country in the world. In other parts of the United States, the chance of a poor kid succeeding is lower than in any developed country in the world.

These patterns would never be seen in a small survey, which might only include a few people in Charlotte and San Jose, and which therefore would prevent you from zooming in like this.

In fact, Chetty’s team could zoom in even further. Because they had so much data—data on every single American—they could even zoom in on the small groups of people who moved from city to city to see how that might have affected their prospects: those who moved from New York City to Los Angeles, Milwaukee to Atlanta, San Jose to Charlotte. This allowed them to test for causation, not just correlation (a distinction I’ll discuss in the next chapter). And, yes, moving to the right city in one’s formative years made a significant difference.

So is America a “land of opportunity”?

The answer is neither yes nor no. The answer is: some parts are, and some parts aren’t.

Seth Stephens-Davidowitz's books