This Summer I became very interested in what I think I will be calling “legal portraits of digital subjects” or something similar. I came to this through doing a study on MOOCs with SMC this summer. The title of the project is “Students as End Users in the MOOC Ecology” (the talk is available online). In the project I am looking at what the Big 3 MOOC companies are saying publicly about the “student” and “learner” role and comparing it to how the same subject is legally constituted to try to understand the cultural implications of turning students into “end users”.
As I was working through this project, and thinking of implications outside of MOOCs and Higher Ed, I realized these legal portraits are constantly being painted in digital environments. As users of the web/internet/digital tools we are constantly in the process of accepting various clickwrap and browse-wrap agreements without thinking twice about it, because it has become a standard cultural practice.
In writing this post I’ve already entered numerous binding legal agreements. Here are some of the institutions that have terms I am to follow:
Internet Service Provider
Document Hosting Service (I wrote this in the cloud somewhere else first)
Blog Hosting Company
Various Companies I’ve Accepted Cookies From
Social Media Sites
I’ve gone through and read some of the Terms (some of them I cannot find). I’ve allowed for the licensing and reproduction of this work in multiple places without even thinking twice about it. We talk a lot about privacy concerns. We know that by producing things like blog post, or status updates we are agreeing to being surveilled to various degrees. I’d love to start a broader conversation on the effects of agreeing to a multitude of Terms though, not just privacy, simply by logging on and opening a browser.
401 Access Denied , 403 Forbidden , 404 Not Found , 500 Internal Server Error & the Firehose
There is this thing called the firehose. I’ve witnessed mathematicians, game theorists, computer scientist and engineers (apparently there is a distinction), economists, business scholars, and social scientist salivate over it (myself included). The Firehouse, though technically reserved for the twitter API, is all encompassing in the realm of social science for the streams of data that come from social networking sites that are so large that they cannot be processed as they come in. The data are so large, in fact, that coding requires multiple levels of computer aided refinement, as though when we take data from these sources we are drinking from a firehose. While I cannot find the etymology of where the term came from, it seems it either came from twitter terminology bleed, or a water fountain at MIT.
I am blessed with an advisor who has become the little voice that I always have at the back of my head when I am thinking about something. Every meeting he asks the same question, one that should be easy to answer but almost never is, especially when we are invested in a topic, “why does this matter?” To date, outside of business uses or artistic exploration we’ve not made a good case for why big data matters. I think we all want it because we think some hidden truth might be within it. We fetishize big data, and the Firehouse that exists behind locked doors, as though it will be the answer to some bigger question. The problem with this is, there is no question. We, from our own unique, biased, and disciplinary homes, have to come up with the bigger questions. We also have to accept that while data might provide us with some answers, perhaps we should be asking questions that go deeper than that in a research practice that requires more reflexivity than we are seeing right now. I would love to see more nuanced readings that acknowledge the biases, gaps, and holes at all levels of big data curation.
Predictive Power of Patterns
One of my favorite anecdotes that shows the power of big data is the Target incident from February 2012. Target predicted a teenage girl was pregnant and acted as such before she told her family. They sent baby centric coupons to her. Her father called Target very angry then called back later to apologize because there were some things his daughter hadn’t told him. The media storm following the event painted a world both in awe and creeped out by Targets predictive power. How could a seemingly random bit of shopping history point to a pattern that showed that a customer was pregnant? How come I hadn’t noticed that they were doing this to me too? Since the incident went public, and Target shared how they learned how to hide the targeted ads and coupons to minimize the creepy factor I’ve enjoyed receiving the Target coupon books that always come in pairs to my home, one for me and one for my husband, that look the same on the surface but have slight variations on the inside. Apparently target has learned that it the coupons for me go to him they will be used. This is because every time I get my coupon books I complain to him about my crappy coupon for something I need. He laughs at me and shows me his coupon, usually worth twice as much as mine if I just spend a little bit more. It almost always works.
In 2004 Lou Agosta wrote a piece titled “The Future of Data Mining- Predictive Analytics”. With the proliferation of social media, API data access, and the beloved yet mysterious firehose, I think we can say the future is now. Our belief and cyclical relationship with progress as a universal future inevitability turns big data into a universal good. While I am not denying the usefulness of finding predictive patterns, clearly Target knew the girl was pregnant and was able to capitalize on that knowledge, for the social scientist, this pattern identification for outcome prediction followed by verification should not be enough. Part of our fetishization of big data seems to be in the idea that somehow it will allow us to not just anticipate, but to know, the future. Researchers across fields and industries are working on ways to extract meaningful, predictive data from these nearly indigestible datastreams. We have to remember that even in big data there are gaps, holes, and disturbances. Rather than looking at what big data can tell us, we should be looking towards it as an exploratory method that can help us define different problem sets and related questions.
Big Data as Method?
Recently I went to a talk by a pair of computer scientists. There were people speaking who had access to the entire database of Wikipedia. Because they could, they decided to visualize Wikipedia. After going through slide after slide of pretty colors, they said “who knew there were rainbows in Wikipedia!?”, and then announced that they had moved on from that research. Rainbows can only get me so far. I was stuck asking why this pattern kept repeating itself and wanting to know how people who were creating the data that turned into a rainbow imagined what they were producing. The visualizations didn’t answer anything. If anything, they allowed me to ask clearer, more directed questions. This isn’t to say the work that they did wasn’t beautiful. It is and was. But there is so much more work to do. I hope that as big data continues to become something of a social norm that more people begin to speak across the lines so that we learn how to use this data in meaningful ways everywhere. Right now I think that visualization is still central, but that is one of my biases. The reason I think this is the case because it allows for simple identification of patterns. It also allows us to take in petabytes of data at once, compare different datasets (if similar visualization methods are used) and, to experiment in a way that other forms of data representation do not. When people share visualizations they either show their understandable failure or the final polished product meant for mass consumption. I’ve not heard a lot of conversation about using big data, its curation, and visualization generation as/and method, but maybe I’m not in the right circles? Still, I think until we are willing to share the various steps along the way to turning big data into meaningful bits, or we create an easy to use toolkit for the next generation of big data visualizations, we will continue to all be hacking at the same problem, ending and stopping at different points, without coming to a meaningful point other than “isn’t big data beautiful?”