So…. This past weekend I had what is known as a sleep fail, and decided to work on stuff I’d been putting off doing because … that’s what we do. In the midst of it all, twitter, through Kate Crawford, brought a story that appeared on the register about Googles computer becoming smart on their own, and I thought it as perfect. The title was “

If this doesn’t terrify you… Google’s computers OUTWIT their humans‘Deep learning’ clusters crack coding problems their top engineers can’t

One of the things I’ve been trying to think through over the past three months? is how the social exists, intertwines, or matters in big data. Here are some draft thoughts… still working through it.

Begin Draft Thoughts

The biggest myth I’ve identified to date in trying to understand how we can move from Big Data to individuals is that there is no social in big data. There are things that happen in Big Data that have strong effects on the spaces we conceive of as being social (the area of most interest for me being the ethical/legal space), but big data inherently reduces the actions that my ethnomethodological heart would call individuals to neutral points on a graph or system that is interlinked with other points that might be individuals, objects, or things, as organized by some algorithmic system that was original programmed based on programs of other people to output information that can be read by whatever needs to read it to begin another action. The reason I say “read by a whatever” is because the reader might or might not end up being a human. A good portion of the reading that is done is done by other machines and algorithms, and endless loop of finding meaning and actions in the patterns or behaviour of programmed points. This becomes even more interesting because the most successful instance to date of a machine learning reading big data came from Google. The model they used for that machine was the human brain, and of course, one of the first things their neural network learned to do was recognize cats.

Only, it didn’t actually learn to recognize “cats”. I learned that there was a connection between this phenomenon of grouping pixels together in a specific way that became a pattern of unknown name for the human machine that was only recognized as “cat” when a human reader took the information the machine had compiled, looked at the pattern image the machine had created and recognized it as “cat”. Up until that moment, the machine had simply algorithmically found a pattern in the noise of the data. “The Social” we find in big data is like the Google cat, only even more imaginary, inasmuch as there is no algorithm that can output “the social” as “the social” is something we define in our scholarly pursuits to understand the phenomena that occur in patterned sets amongst individual actors linked together by contingent circumstances defined for the purpose of our scholarly projects.

With Big Data, researchers try to show through the occurrence of patterns that emerge in conditions that have been set on multiple levels show that there is some hidden universal truth through which we can “make economic, social, technical, and legal claims”. Even more than the links within the structured data environments being false, the idea that anything that can be gotten from these new sources of big data, especially those favored by social scientists, as the name of the genre reads as though it was created for them, social media is an imagined social. When we begin to look at use statistics and demographic information from Social Media Sites (SMS) such as twitter and Facebook, we can clearly see how, even though the data they have is big, and the algorithms they have are massive, the percentage of global populations than use these sites regularly is relatively small, and relatively homogenous when it comes to things such as age, or education level, geographic hotspots etc. leaving the individual data plots as relatively uninteresting shapes that move in ebbs and flows. The ebbs and flows end up being the interesting part of the pseudo-social interactions of big data. And even these are colored and shaped by programming written by an individual who expected to see certain types of movement within the big data as it was put through an algorithm that turned it into understandable information, individuals who are increasingly being pushed to create outputs that are more visually pleasing in lieu of conducting a deeper analysis of the data and its implications.

Clearly, I am in the process of thinking through what big data means for my various projects. I imagine I will be on this kick over the next few months. Let me see if I can correctly remember the process that brought me here. I was in the middle of a conversation about surveillance and wearable technology have surveillance technology built-in to feed the data farm. This made me go to Wikipedia to look up data farming, because it had to exist! (it does). The article left me thinking it didn’t quite get at what I wanted, because the data farm I was imagining isn’t a simulation. It is reality.

I had an ah-ha moment on twitter. Or at least I had the thought that made me do this thought experiment.


Rather than attempting to write something new, I decided to take part of the text from the Wikipedia prison farm entry and change a few words to see if big data farms as prison farms work. Here are the results.

Rewriting: Wikipedia Prison Farm Entry for Big Data Farms & Digital Platforms

A big data farm or digital platform is a large correctional facility where social labor users are put to economical use in a ‘farm’ (in the wide sense of a productive unit), usually for data labour, largely in open systems, such as in social, personal and, technological media, etc. Its historical equivalent on a very large-scale was called a penal colony.

The data produced by big data farms are generally used primarily to feed the laborers themselves and other wards of the platforms, and secondarily, to be sold for whatever profit the platform company, and any other company the users may have entered into a clickwrap or browsewrap agreement with, may be able to obtain. This configuration is often referred to as prosumption.

In addition to being forced to labor (produce data) directly for the government on a big data farm or in a social media platform, laborers may be forced to do farm work for private enterprises by being farmed out through the practice of selling access to data streams to work on private profit-making initiatives (often targeted advertising or related industries such as, taste matching, shopping and media recommendations, career services, etc.). Data purchasing is also done by law enforcement and government agencies. The party purchasing the data for the government generally does so at a steep discount from the cost of free labor.

Depending on the prevailing judicial doctrine on terms of use and data ownership, psychological and/or physical cruelty through loss of privacy and/or intellectual property ownership may be a conscious intent of big data farm labor, and not just an inevitable but unintended collateral effect.

I have a dilemma. The part of me that was trained as a social scientist is intrigued by big data while the part of me that is trained as a critical humanist is screaming “where are the humans!?” There is this whole beautiful book The Human Faces of Big Data that says it tackles the subject, but it is… weird. It is all weird right now. So I thought I might share a few more thoughts on the limits I’m seeing with big data as a concept. I suppose I should be up front and acknowledge that I never could quite catch the post-human bandwagon. I have no burning desire to merge with the machine anymore than I already have. In fact, I am quite happy with my life as a cyborg. That being said, we are being compiled as unique data sets in this new world of big data… Only I’m not sure how new it is.

Big Data and the Book

One of the earlier modes of downloading large amounts of data, that also captured spirits of  the human behind its creation (basically the beta release of post humanism) is the book. Books record of complex thoughts into a physicality corpus that was smaller than its creator yet captured thoughts in a transferable way, copy able way, reproduceable way… And had many of the same ownership problems we are seeing now with digital media, but also personal data… Here is why I can’t buy in to post humanism as it is being imagined in my little world of media & technology studies, would you ever look at a book and think “that’s a human”?  As in books, it seems there are no humans in big data. I’ve been searching for a bit and the human hasn’t revealed itself. It seems the human is only the start place and the end place for the machines to communicate, create machine readable knowledge, make decisions, and then predict the next action of the unique data set (individual actor, item or thing).

Big Data is Predictive Future Time

I think the scale of big data (omfg Zettabytes!) and the relationship to time are the biggest change. While books are always already a recording of past thoughts, big data is mobilized toward the future. While books are designed to influence current thought and possibly shape the future, the focus seems to be more on the past informing the now. Big Data with its focus on pattern recognition, prediction, and visualization of this information in artistic and abstract yet understandable terms seems to exist in what I am thinking is a concept of time that is always already grounded in the future. Big Data has limited value to the past, in as much as yes it helps us understand the past but isn’t mobilizable in a meaningful way unless we can some how use it to say something about the to come…

This is the central problem to me I think. When we don’t allow the “now” to exist… and I feel like big data moves so quickly, and there is so much of it that there is never a “now”, we don’t allow a space for human experience.  And while I love patterns, and I am fine with them existing, the space of experience is where humans create meaning out of this finite thing we call life. When we move towards understanding everything as a bit of data in a large data stream that can tell us something about the future, we erase the human, inherently.  And because we erase the human, the ethical components of big data are hard to place, because there are no bodies in data. We see the result of this when we look at the current actions that have come to light of the US Government, recent ebbs and flows of various exchanges that are now run by computers, my favorite big data story ever of Target contacting the pregnant teenager before she had the chance to tell her family, etc.

I know that there is work being done on biases in big data, which is awesome. I think in addition to that, we need to start asking where the human in big data is too. The concept of big data makes it easy to sort of lose the human in the stream… but we have countless examples to show that when it moves to places of power (government, target, financial markets, MOOCs!, etc), it is mobilized to discern the difference in individuals and individual items against the aggregate… and when this happens there are real world effects that happen to actual human bodies.

So yes. Actual humans and big data… where’s the conversation?

As an aside on the future of the book

Since I’ve been speaking with people over the years on the future of the book in the digital world, I’m beginning to wonder if the problem is that we are “out of time” when we try to translate the form. While books are always the past, digital data is always about the future at this point, because we are sort of big data now. As such, I feel like perhaps to get a digital “book” project would need to be incomplete, to be completed/expanded at a later time by multiple anonymous people outside of the original creator.