The “Social” of Big Data or There Once was a “cat”
So…. This past weekend I had what is known as a sleep fail, and decided to work on stuff I’d been putting off doing because … that’s what we do. In the midst of it all, twitter, through Kate Crawford, brought a story that appeared on the register about Googles computer becoming smart on their own, and I thought it as perfect. The title was “
One of the things I’ve been trying to think through over the past three months? is how the social exists, intertwines, or matters in big data. Here are some draft thoughts… still working through it.
Begin Draft Thoughts
The biggest myth I’ve identified to date in trying to understand how we can move from Big Data to individuals is that there is no social in big data. There are things that happen in Big Data that have strong effects on the spaces we conceive of as being social (the area of most interest for me being the ethical/legal space), but big data inherently reduces the actions that my ethnomethodological heart would call individuals to neutral points on a graph or system that is interlinked with other points that might be individuals, objects, or things, as organized by some algorithmic system that was original programmed based on programs of other people to output information that can be read by whatever needs to read it to begin another action. The reason I say “read by a whatever” is because the reader might or might not end up being a human. A good portion of the reading that is done is done by other machines and algorithms, and endless loop of finding meaning and actions in the patterns or behaviour of programmed points. This becomes even more interesting because the most successful instance to date of a machine learning reading big data came from Google. The model they used for that machine was the human brain, and of course, one of the first things their neural network learned to do was recognize cats.
Only, it didn’t actually learn to recognize “cats”. I learned that there was a connection between this phenomenon of grouping pixels together in a specific way that became a pattern of unknown name for the human machine that was only recognized as “cat” when a human reader took the information the machine had compiled, looked at the pattern image the machine had created and recognized it as “cat”. Up until that moment, the machine had simply algorithmically found a pattern in the noise of the data. “The Social” we find in big data is like the Google cat, only even more imaginary, inasmuch as there is no algorithm that can output “the social” as “the social” is something we define in our scholarly pursuits to understand the phenomena that occur in patterned sets amongst individual actors linked together by contingent circumstances defined for the purpose of our scholarly projects.
With Big Data, researchers try to show through the occurrence of patterns that emerge in conditions that have been set on multiple levels show that there is some hidden universal truth through which we can “make economic, social, technical, and legal claims”. Even more than the links within the structured data environments being false, the idea that anything that can be gotten from these new sources of big data, especially those favored by social scientists, as the name of the genre reads as though it was created for them, social media is an imagined social. When we begin to look at use statistics and demographic information from Social Media Sites (SMS) such as twitter and Facebook, we can clearly see how, even though the data they have is big, and the algorithms they have are massive, the percentage of global populations than use these sites regularly is relatively small, and relatively homogenous when it comes to things such as age, or education level, geographic hotspots etc. leaving the individual data plots as relatively uninteresting shapes that move in ebbs and flows. The ebbs and flows end up being the interesting part of the pseudo-social interactions of big data. And even these are colored and shaped by programming written by an individual who expected to see certain types of movement within the big data as it was put through an algorithm that turned it into understandable information, individuals who are increasingly being pushed to create outputs that are more visually pleasing in lieu of conducting a deeper analysis of the data and its implications.