Fwd: Big Doubts About Big Data - The Chronicle of Higher Education

The term itself is quite phenomenal. Its capacity to morph into so many forms and functions is akin to a powerful shape-shifter, taking on new meaning amid a new data-driven grammar. Put any noun in front of the term, and you have just named an area of life that Big Data is going to somehow transform: health, finance, education, marketing and retail, sports, environment and climate, housing and cities. Put an adjective in front of it—gloopy, colored, short, fat, thin—and you’ll see it catch on, at least in some circles, for at least a short time.

But mostly the grammar of Big Data is about verbs and what we can do with it: predict; steer, shape; harvest, harness, mine; sort, store, synthesize; track and trace; innovate and transform; optimize, maximize, visualize; and so on. So many of those verbs are about maximizing the capacity to model human behavior: intervening, faster and more efficiently than ever before, now, in real time—or as quickly as possible, so we can shift from forecasting to "now-casting" and prevent traffic hot spots, epidemics, riots, and civil unrest.

To date much of the discussion has been wrapped around the challenges of Big Data. How fast we can get it. How much we can store. How best we can protect privacy, access, security. Where it came from. Who owns it. What platforms structure it. It’s understandable why industry and commercial enterprises are the most enthusiastic about it: More-sophisticated analytics will improve sales and expand the customer base by fine-tuning and individualizing marketing strategies.

But government agencies and the private sector are also galloping into the race. (They have to, because the public and private sectors are increasingly intertwined.) The rhetoric here is that Big Data will improve our welfare and health-care systems, produce better "smart cities" with automated, more-efficient energy, transport, water, and waste systems, and finally allow us to track students in our schools and colleges to ensure that no one slips through key milestones. It will supposedly help us deal with crime and terrorism. Big Data brings new hope to big social problems and social policy, and is likely to be cheaper to use than organizing large-scale official surveys.

But using Big Data to maximize sales and profits is one thing. Using it for social policy and planning is another. Of course the ethical and privacy issues about who owns data need to be negotiated carefully. But there are far greater problems: How are data analyzed, and by whom? And who is making decisions about how to interpret the data? These questions need urgent attention.

As you already know, I’m not a fan of the term. And it’s not because I can’t see the opportunities. I can. It’s an exciting era, with us all increasingly plugged into the digital landscapes we have created. And yes, we can and should be better equipped. Given the state of inequalities within and across countries, there’s every reason to be glad that more information may be available to rethink the Big Problems. More data are generally preferable to less data. But I can also see a growing disillusion with Big Data, especially when it is applied to social policy, planning, and practice.

The Big Data frenzy seems to have unleashed a bizarre digitized version of the Enlightenment. Although different in their digital guise, at their most optimistic the hopes for Big Data are much the same as those of the Enlightenment back in the 17th and 18th centuries: to foreground rationality and, in the name of "science," to control nature—except that quest now is explicitly to control people by making them behave in particular ways. Whereas the Enlightenment’s goal was truth, Big Data’s is to help us "know" things better. The Enlightenment’s power lay in reason. Big Data’s lies not in data per se, but rather in the methodological capacity to work the swathes of them better than ever before. So increasingly statistics, clustering, networks, data mining, machine learning and genetic algorithms, simulation, pattern detection, and high-resolution visualization are part of Big Data’s tool kit.

New computational methods will allow us to track change and continuity through time and space, leading to better models and descriptions that dig deeper and deeper into our social world and confront more systematically old problems like racism. That’s why Big Data is said to be "shaking up the social sciences." It all sounds great, doesn’t it? What’s the problem?

Well, there’s no problem in wanting the world to be a better place and hoping that Big Data can help us move toward it. "But what do you do when you realize that all that data is not enough?" the Microsoft researcher Kate Crawford poignantly writes in The New Inquiry. "From the Boston bombings to Malaysian Airlines flight 370, we know that data black holes exist. … These moments demonstrate why the epistemic Big Data ambition—to collect it all—is both never-ending and deeply flawed."

To state such things is not to be "anti" Big Data; it is to acknowledge that Big Data can be usefully applied to some kinds of problems, but not to others. And that politics always plays a role in government decisions.

After all, the big ugly nasty social problems—often referred to by policy analysts as "wicked problems"—are difficult to change not because of lack of data, but because of the nature of the problems themselves. Poverty, for example, may be understood generally across the world, but to tackle particular poverty traps in particular neighborhoods for particular households, the local context needs to be taken into account; and poverty anywhere is often both a cause and a symptom of other problems, such as poor health or normalized racism.

Social states emerge from complex dynamics and don’t always reflect a linear patter of cause and effect. They defy prediction. Similar initial conditions can lead to different outcomes, and similar outcomes can be produced by different initial conditions.

That is why, no matter how much data we have, our models and narratives of the future always include the "known unknown." At some point, in some way, unexpected change will demand a shift in our behavior, which may be momentous or momentary, but will have momentum; we will be dragged into action, whether we like or anticipate it or not, which may mean new approaches are needed to understand what is happening. The global financial crisis and 9/11 are testimony to that.

The importance of context, memory, biography, history, and the multiplicity of our temporal experiences cannot be overestimated. We are profoundly meaning-making beings, and any attempt to model the social sphere needs to account for the feedback loops that are necessarily and always involved.

Categories used to count social "facts," like "religion" or "ethnicity," are politically loaded, subject to change, and interact with other categories like "age," "place," "occupation." Social statistics do not merely describe and explain, but are part of the social fabric that goes into recasting what (and who) is counted and measured. An engagement with Big Data without interpretation is, quite frankly, terrible analysis.

Social scientists have for a long time argued against reductionist modes of analysis, precisely because the complex whole cannot be broken down to the sum of its parts. Think about what constitutes your family. Knowing all the measurable attributes associated with each member is unlikely to help explain what it means to be in your family. The social world is messy—much more so than our data and models currently allow. It doesn’t become less messy just because we have more data.

What is frightening is that Big Data takes reductionism further than ever before: We zoom in, zoom out, classify, and reshape those social spaces, only to break them into tiny divisible pixels; and then we piece the data back together again through algorithms and rule-based classifiers, profiling and indexing new kinds of pixel-piecey-people, who become as real as pixies to those who believe in them.

It is when governments (increasingly) rely on Big Data that the dangers come into their own. Businesses want to know what most of their customers are doing. But policy analysts generally focus on extreme groups. Why are they in the data? Why are failing schools in rich countries? Why do some failing schools get better and others don’t? Why don’t some deprived neighborhoods change in response to renewal programs, while others do? Big Data helps describe the similarities among individuals and groups. It doesn’t necessarily explain their differences.

Models, we know, aren’t "real." They are a version of what’s happening, and only a partial one at that. That is not to say we cannot improve our current data models or know something more about short- and even long-term forecasting. Nor is it to say that, for some problems, Big Data isn’t going to be extremely useful. "Smart cities" may help traffic systems—but they may not regenerate failing schools or excluded neighborhoods. If we think Big Data is the answer to solving our "wicked problems," we are on a path to nowhere good.

Commercial interests are reaping the benefits and driving the analysis of Big Data. But their models can’t be adapted to social policy without asking what they describe, why, how—and who is profiting. Deciding which models are useful needs to be a collective debate.


The irony is that, although digital data have radically transformed social life since the Enlightenment, some fundamental metaphysical elements have stayed much the same. We still fall in and out of love, we tire and need to sleep, we laugh, we cry, we live, and we die. Birth, death, aging, and mortality are each imbued with individual and collective meaning that draws on memory and culture, how art and beauty move us, how faith and religion shape our meaning.

Just as the Enlightenment project was met with resistance, Big Data will not enlighten any of us if it neglects the power of the immeasurable things that make us human. (It is telling that movie animators make objects "come alive"’ by giving them humanity and emotion.) Unless we draw on social theory and interpret big quantitative data qualitatively, then we are missing the point of what it is that makes Big Data fundamentally social.

Because of the legacy of the Enlightenment, and the sheer amount of arrogance and unquestioned credibility behind the claims of science and mathematics, it is very difficult—especially for nonscientists—to raise concerns about Big Data without coming across as "against" or "not willing to engage with it."

In the end, however, we need to retain our capacity to act intentionally, individually and collectively. Humans can choose to reflect on the metrics and classifications and algorithms we make. Yet increasingly we don’t. We collect data and follow each other like a trail of ants, building bigger infrastructures in the form of data-driven services, hardware and software services, automated data processes, all of which exhaust and alienate us.

Do we really want to live in a world ruled by metrics and indices and classifications determined by automated algorithms?

Like the man in Roald Dahl’s short story who eats so much royal jelly that he starts to resemble a great big bee, we too may risk engorging ourselves with so much data that we morph into a new kind of social animal whose behavior is simulated and modeled and reduced to mathematical rules. If we are excited about Big Data, surely it is—or should be—because we can use it to define what we want, rather than to passively accept what may be coming our way.