This Summer I became very interested in what I think I will be calling “legal portraits of digital subjects” or something similar. I came to this through doing a study on MOOCs with SMC this summer. The title of the project is “Students as End Users in the MOOC Ecology” (the talk is available online). In the project I am looking at what the Big 3 MOOC companies are saying publicly about the “student” and “learner” role and comparing it to how the same subject is legally constituted to try to understand the cultural implications of turning students into “end users”.
As I was working through this project, and thinking of implications outside of MOOCs and Higher Ed, I realized these legal portraits are constantly being painted in digital environments. As users of the web/internet/digital tools we are constantly in the process of accepting various clickwrap and browse-wrap agreements without thinking twice about it, because it has become a standard cultural practice.
In writing this post I’ve already entered numerous binding legal agreements. Here are some of the institutions that have terms I am to follow:
Internet Service Provider
Document Hosting Service (I wrote this in the cloud somewhere else first)
Blog Hosting Company
Various Companies I’ve Accepted Cookies From
Social Media Sites
I’ve gone through and read some of the Terms (some of them I cannot find). I’ve allowed for the licensing and reproduction of this work in multiple places without even thinking twice about it. We talk a lot about privacy concerns. We know that by producing things like blog post, or status updates we are agreeing to being surveilled to various degrees. I’d love to start a broader conversation on the effects of agreeing to a multitude of Terms though, not just privacy, simply by logging on and opening a browser.
Job Title: MSR PhD Intern/Research Assistant
Regional Area: Bangalore/Hyderabad, India
Primary Background: Anthropology
Secondary Background: Journalism, Media, or Communication Studies
Microsoft Research is looking for a junior researcher (students recently graduated from or currently enrolled in a Masters or PhD level program) to collaborate on an ethnographic study of crowdsourcing. Project responsibilities will primarily involve collaborating with MSR Researchers to find and interview crowdworkers living in or near the IT centers of Bangalore and Hyderabad, India.
Applicants must, at minimum, speak fluent Telugu, Urdu, and/or Kannada, be able to navigate Bangalore and/or Hyderabad, and have strong written and spoken English skills.
The successful candidate should have a BA and/or MA in a social science (or at least 1 year relevant experience); excellent organizational skills; and thrive working independently. The ideal candidate will also have experience doing in-person interviews, including the research skills necessary to find potential interviewees. Preference will be given to applicants with formal training in ethnography, qualitative research methods, or journalism.
This Microsoft Research Assistantship will provide a stipend of approximately INR 40,000 per month for up to six months and office space at Microsoft Research India, Bangalore. The stipend varies based on qualification (PhD, Masters and Bachelors) and prior research experience. Research Assistantships also include single occupancy accommodation in Bangalore and economy class airfare as reimbursement if not based in Bangalore.
Please send questions or submit the following materials to Mary L. Gray (mLg@microsoft.com), Senior Researcher, to apply for this position:
- Current CV
- Names and contact email information for 2 academic references
- Cover letter outlining your interest in this assistantship, your availability, relevant prior experience and qualifications for this position
ABOUT MICROSOFT RESEARCH: In 1991, Microsoft Corp. became one of the first software companies to create its own computer science research organization. Microsoft Research has developed into a unique entity among corporate research facilities, balancing an open academic model with an effective process for transferring its research to product development teams. Today, Microsoft Research has more than 1,100 world-renowned scientists and engineers, including some of the world’s finest computer scientists, sociologists, psychologists, mathematicians, physicists and engineers, working across more than 55 areas of research. Microsoft Research has expanded globally to ensure that it can attract the richest pool of talent with 13 worldwide labs, including Microsoft Research India, based in Bangalore.
Digital inequality scholarship is well-intentioned. It debunks myths about digital media’s inherent egalitarianism and draws attention to the digital dimensions of social inequalities. Digital inequality scholars have shown, for example, that people with access to networked media use those technologies in different ways, some of which are thought to be more beneficial than others. They have highlighted how differences in skills and quality of access shape use. And they have rightly attacked the stereotype of the digital generation. These are important contributions for which we should be grateful.
Yet digital inequality scholarship is also limited in some fundamental, and I believe hazardous, ways. To defend these claims, I will draw on an in-depth ethnographic study of an ambitious attempt to combat digital inequality: a new, well-resourced, and highly touted public middle school in Manhattan that fashions itself as, “a school for digital kids.” It is hard to imagine a more concerted attempt to combat digital inequality, and yet the school paradoxically helped perpetuate many of the very social divisions it hoped to mend. In-depth ethnographic studies can help us understand these outcomes, and they can provide us with tools for forming more accurate conceptions of relations between digital media and social inequalities.
I will call this school, which opened in the fall of 2009, the Downtown School for Design, Media and Technology, or the Downtown School for short. Supported by major philanthropic foundations, and designed by leading scholars and practitioners from the learning sciences as well as media technology design, the Downtown School braided digital media practices, and especially media production activities, throughout its curriculum. They had enviable financial, technological, and intellectual resources, and they recruited an atypically diverse student body for a New York City public school. About half the students came from privileged families where at least one parent worked in a professional field and held an advanced degree. And about 40-percent of students came from less-privileged families that qualified for free or reduced-price lunch; these parents and guardians often had some or no college education and worked in comparatively low-paying service work. All students took a required game design course, and the school’s entire suite of after-school programs were devoted to making, hacking, remixing, and designing media technology.
Digital inequality scholarship played a role in the formation the Downtown School and similar interventions. Concepts such as the digital divide, the “participation gap” (Jenkins et al. 2006), the “digital production gap” (Schradie 2011), or the “participation divide” (Hargittai and Walejko 2008) implicitly, if not explicitly, recommend and legitimate interventions such as the Downtown School. Since digital inequality scholars argue that skill differentials play a large role in producing digital inequalities, educational practitioners understandably craft interventions to reduce these differentials.
According to such a framework, the Downtown School was successful in many ways. Both boy and girl students from diverse economic and ethnic backgrounds learned to use digital media in new ways. In particular, students learned to use digital tools to be producers, rather than just consumers, of digital media. Through the lens of concepts like the “participation gap,” the school appears successful and should be quickly replicated.
The problem though – and here is why we need ethnography – is that while the Downtown School arguably helped close the participation gap, it also helped perpetuate historical social divisions, especially those rooted in gender and racialized social class. When the Downtown School opened, it attracted three boys for every two girls; three years after opening, the ratio rose to two-to-one. Only one girl student regularly participated in the school’s after-school programs focused on media production; most regular participants were boys from privileged families. By the end of the first year, all of the economically less-privileged boys in one of the school’s main cliques had left the school for larger, less-resourced schools that had a greater diversity of curricular and extra-curricular offerings as well as more of a dating scene. By the end of the second year, many of the less-privileged girls from another of the main cliques had also left the school. While their reasons for leaving were complex, they and their families suggested in interviews with me that the Downtown School was not a ‘good fit.’ By contrast, nearly all of the privileged students remained enrolled, and many of their parents were enthusiast boosters for the school.
Why were many students, and especially many of the less-privileged students, not able or unwilling to take advantage of the purportedly beneficial opportunities afforded by the Downtown School? Digital inequality frameworks do not provide a satisfying way to answer this question. They do not see many of the factors that matter to people in different situations, nor the nexus of conditions and forces that shape what people do, and do not do, with and without digital media. Ethnography, in contrast, casts a much wider net that can help account for these conditions and processes. A few more examples will help clarify this point.
On the ground, I observed and documented what students were doing when they were not taking advantage of the school’s purportedly beneficial activities. It turned out that most of the students spent their afternoon hours in familiar activities that predate the digital age: basketball practices, music lessons, swimming classes, learning a foreign language, dance classes, taking care of siblings and cousins, chores, and so forth. These activities meant a lot to students and their families, and many expressed a desire for the school to offer more diverse curricular and extra-curricular offerings. These activities were also integral to how students navigated and negotiated identity and difference with their peers at school (Sims 2014). This wider ecology of practices, as well as what participation and non-participation meant for those involved, would be invisible if one were to study the Downtown School using the digital inequality framework. And what a digital inequality approach would have captured and championed would have mostly reflected the interests and practices of those who were most privileged.
One can still argue that social scientists, policy makers, and educational practitioners should do all that they can to close digital inequalities such as the participation gap. One can argue that doing so is in the best interest of those currently on the wrong side of the chasm. One can argue that treating digital inequalities is akin to dealing with a public health concern, or, more aptly, that it should be folded into broader efforts to mandate STEM education amongst all contemporary school children. In short, digital inequality scholars can admit that there is a prescriptive character to their efforts and that treatment is justified because it is in the best interest of the public as well as those being treated.
This is a debate that can be had but it is not the debate that digital inequality scholars are currently having. In its current form, the digital inequality debate escapes these issues because it assumes that certain decontextualized “uses” will be universally appealing to people once barriers to participation – lack of quality access, skills, etc. – are removed. There is a sort of technology-focused ethnocentrism to these assumptions that prevents this potentially uncomfortable debate from ever taking place. If digital inequality scholars were to acknowledge the prescriptive character of their scholarship, a host of thorny ethical dilemmas would quickly surface: To what degree should social scientists, policy-makers, and educational practitioners force people to partake in participatory culture? To what extent do the ends justify the means? What exercises of power are legitimate? What liberties should be granted to those identified for treatment? And so on.
These are difficult questions, and my guess is that most digital inequality scholars do not want to address them. My own feeling is that scholars should be extremely cautious in pushing for such treatments, whether domestically or abroad, even if they feel that their medicine would be in the best interest of the treated. The histories of various missionary and colonial endeavors – to name just a few charged examples – make the ethical and political hazards of such an enterprise all too clear.
Note: This was originally posted at Ethnography Matters.
Hargittai, E. & Walejko, G., 2008. The Participation Divide: Content Creation and Sharing in the Digital Age. Information, Communication & Society, 11(2), pp.239–256.
Jenkins, H. et al., 2006. Confronting the Challenges of Participatory Culture: Media Education for the 21st Century, Chicago, IL: The John D. and Catherine T. MacArthur Foundation.
Schradie, J., 2011. The Digital Production Gap: The Digital Divide and Web 2.0 Collide. Poetics, 39(2), pp.145–168.
Sims, C., 2012. The Cutting Edge of Fun: Making Work Play at the New American School. University of California, Berkeley. http://www.ischool.berkeley.edu/files/sims_2012_cuttingedgeoffun.pdf
Sims, C., 2014 (forthcoming). From Differentiated Use to Differentiating Practices: Negotiating Legitimate Participation and the Production of Privileged Identities. Information, Communication & Society. http://www.tandfonline.com/doi/full/10.1080/1369118X.2013.808363
After Yahoo’s high-profile purchase of Tumblr, when Yahoo CEO Marissa Mayer said that she would “promise not to screw it up,” this is probably not what she had in mind. Devoted users of Tumblr have been watching closely, worried that the cool, web 2.0 image blogging tool would be tamed by the nearly two-decade-old search giant. One population of Tumblr users, in particular, worried a great deal: those that used Tumblr to collect and share their favorite porn. This is a distinctly large part of the Tumblr crowd: according to one analysis, somewhere near or above 10% of Tumblr is “adult fare.”
Now that group is angry. And Tumblr’s new policies, that made them so angry, are a bit of a mess. Two paragraphs from now, I’m going to say that the real story is not the Tumblr/Yahoo incident, or how it was handled, or even why it’s happening. But the quick run-down, and it’s confusing if you’re not a regular Tumblr user. Tumblr had a self-rating system: blogs with “occasional” nudity should self-rate as “NSFW”. Blogs with “substantial” nudity should rate themselves as “adult.” About two months ago, some Tumblr users noticed that blogs rated “adult” were no longer being listed with the major search engines. Then in June, Tumblr began taking both “NSFW” and “adult” blogs out of their internal search results — meaning, if you search in Tumblr for posts tagged with a particular word, sexual or otherwise, the dirty stuff won’t come up. Unless the searcher already follows your blog, then the “NSFW” posts will appear, but not the “adult” ones. Akk, here, this is how Tumblr tried to explain it:
What this meant is that your existing followers of a blog can largely still see your “NSFW” blog, but it would be very difficult for anyone new to find it. David Karp, founder and CEO of Tumblr, dodged questions about it on the Colbert Report, saying only that Tumblr doesn’t want to be responsible for drawing the lines between artistic nudity, casual nudity, and hardcore porn.
Then a new outrage emerged when some users discover that, in the mobile version of Tumblr, some tag searches turn up no results, dirty or otherwise — and not just for obvious porn terms, like “porn,” but also for broader terms, like “gay”. Tumblr issued a quasi-explanation on their blog, which some commentators and users found frustratingly vague and unapologetic.
Ok. The real story is not the Tumblr/Yahoo incident, or how it was handled, or even why it’s happening. Certainly, Tumblr could have been more transparent about the details of their original policy, or the move in May or earlier to de-list adult Tumblr blogs in major search engines, or the decision to block certain tag results. Certainly, there’ve been some delicate conversations going on at Yahoo/Tumblr headquarters, for some time now, on how to “let Tumblr be Tumblr” (Mayer’s words) and also deal with all this NSFW blogging “even though it may not be as brand safe as what’s on our site” (also Mayer). Tumblr puts ads in its Dashboard, where only logged-in users see them, so arguably the ads are never “with” the porn — but maybe Yahoo is looking to change that, so that the “two companies will also work together to create advertising opportunities that are seamless and enhance the user experience.”
What’s ironic is that, I suspect, Tumblr and Yahoo are actually trying to find ways to remain permissive when it comes to NSFW content. They are certainly (so far) more permissive than some of their competitors, including Instagram, Blogger, Vine, and Pinterest, all of whom have moved in the last year to remove adult content, make it systematically less visible to their users, or prevent users from pairing advertising with it. The problem here is their tactics.
Media companies, be they broadcast or social, have fundamentally two ways to handle content that some but not all of their users find inappropriate.
First, they can remove some of it, either by editorial fiat or at the behest of the community. This means writing up policies that draw those tricky lines in the sand (no nudity? what kind of nudity? what was meant by the nudity?), and then either taking on the mantle (and sometimes the flak) of making those judgments themselves, or having to decide which users to listen to on which occasions for which reasons.
Second, and this is what Tumblr is trying, is what I’ll call the “checkpoint” approach. It’s by no means exclusive to new media: putting the X-rated movies in the back room at the video store, putting the magazines on the shelf behind the counter, wrapped in brown paper, scheduling the softcore stuff on Cinemax after bedtime, or scrambling the adult cable channel, all depend on the same logic. Somehow the provider needs to keep some content from some people and deliver it to others. (All the while, of course, they need to maintain their reputation as defender of free expression, and not appear to be “full of porn,” and keep their advertisers happy. Tricky.)
To run such a checkpoint requires (1) knowing something about the content, (2) knowing something about the people, and (3) having a defensible line between them.
First, the content. That difficult decision, about what is artistic nudity, what’s casual nudity, and what’s pornographic? It doesn’t go away, but the provider can shift the burden of making that decision to someone else — not just to get it off their shoulders, but sometimes to hand it someone more capable of making it. Adult movie producers or magazine publishers can self-rate their content as pornographic. An MPAA-sponsored board can rate films. There are problems, of course: either the “who are these people?” problem, as in the mysterious MPAA ratings board, or the “these people are self-interested” problem, as when TV production houses rate their own programs. Still, this self-interest can often be congruent with the interests of the provider: X-rated movie producers know that their options may be the back room or not at all, and gain little i pretending that they’re something they’re not.
Next, the people. It may seem like a simple thing, just keeping the dirty stuff on the top shelf and carding people who want to buy it. Any bodega shopkeep can manage to do it. But it is simple only because it depends on a massive knowledge architecture, the driver’s license, that it didn’t have to generate itself. This is a government sponsored, institutional mechanism that, in part, happens to be engaged in age verification. It requires a massive infrastructure for record keeping, offices throughout the country, staff, bureaucracy, printing services, government authorization, and legal consequences for cases of fraud. All that so that someone can show a card and prove they’re of a certain age. (That kind of certified, high-quality data is otherwise hard to come by, as we’ll see in a moment.)
Finally, a defensible line. The bodega has two: the upper shelf and the cash register. The kids can’t reach, and even the tall ones can’t slip away uncarded, unless they’re also interested in theft. Cable services use encryption: the signal is scrambled unless the cable company authorizes it to be unscrambled. This line is in fact not simple to defend: the descrambler used to be in the box itself, which was in the home and, with the right tools and expertise, openable by those who might want to solder the right tab and get that channel unscrambled. This meant there had to be laws against tampering, another external apparatus necessary to make this tactic stick.
Tumblr? Well. All of this changes a bit when we bring it into the world of digital, networked, and social media. The challenges are much the same, and if we notice that the necessary components of the checkpoint are data, we can see how this begins to take on the shape that it does.
The content? Tumblr asked its users to self-rate, marking their blog as “NSFW” or “adult.” Smart, given that bloggers sharing porn may share some of Tumblr’s interest in putting it behind the checkpoint: many would rather flag their site as pornographic and get to stay on Tumblr, than be forbidden to put it up at all. Even flagged, Tumblr provides them what they need: the platform on which to collect content, a way to gain and keep interested viewers. The categories are a little ambiguous — where is the line between “occasional” and “substantial” nudity to be drawn? Why is the criteria only about amount, rather than degree (hard core vs soft core), category (posed nudity vs sexual act), or intent (artistic vs unseemly)? But then again, these categories are always ambiguous, and must always privilege some criteria over others.
The people? Here it gets trickier. Tumblr is not imposing an age barrier, they’re imposing a checkpoint based on desire, dividing those who want adult content from those who don’t. This is not the kind of data that’s kept on a card in your wallet, backed by the government, subject to laws of perjury. Instead, Tumblr has two ways to try to know what a user wants: their search settings, and what they search for. If users have managed to correctly classify themselves into “Safe Mode,” indicating in the settings that they do not want to see anything flagged as adult, and people posting content have correctly marked their content as adult or not, this should be an easy algorithmic equation: “safe” searcher is never shown “NSFW” content. The only problems would be user error: searchers who do not set their search settings correctly, and posters who do not flag their adult content correctly. Reasonable problems, and the kind of leakage that any system of regulation inevitably faces. Flagging at the blog level (as opposed to flagging each post as adult or not) is a bit of a dull instrument: all posts from my “NSFW” blog are being withheld from safe searchers, even the ones that have no questionable content — despite the fact that by their own definition a “NSFW” tumblr blog only has “occasional” nudity. Still, getting people to rate every post is a major barrier, few will do so diligently, and it doesn’t fit into simple “web button” interfaces.
Defending the dividing line? Since the content is digital, and the information about content and users is data, it should not be surprising that the line here is algorithmic. Unlike the top shelf or the back room, the adult content on Tumblr lives amidst the rest of the archive. And there’s no cash register, which means that there’s no unavoidable point at which use can be checked. There is the login, which explains why non-logged-in users are treated as only wanting “safe” content. But, theoretically, an “algorithmic checkpoint” should work based on search settings and blog ratings. As a search happens, compare the searcher’s setting with the content’s rating, and don’t deliver the dirty to the safe.
But here’s where Tumblr took two additional steps, the ones that I think raise the biggest problem for the checkpoint approach in the digital context.
Tumblr wanted to extend the checkpoint past the customer who walks into the store and brings adult content to the cash register, out to the person walking by the shop window. And those passersby aren’t always logged in, they come to Tumblr in any number of ways. Because here’s the rub with the checkpoint approach: it does, inevitably, remind the population of possible users, that you do allow the dirty stuff. The new customer who walks into the video store, and sees that there is a back room, even if the never go in, may reject your establishment for even offering it. Can the checkpoint be extended, to decide whether to even reveal to someone that there’s porn available inside? If not in the physical world, maybe in the digital?
When Tumblr delisted its adult blogs from the major search engines, they wanted to keep Google users from seeing that Tumblr has porn. This, of course, runs counter to the fundamental promise of Tumblr, as a publishing platform, that Tumblr users (NSFW and otherwise) count on. And users fumed: “Removal from search in every way possible is the closest thing Tumblr could do to deleting the blogs altogether, without actually removing 10% of its user base.” Here is where we may see the fundamental tension at the Yahoo/Tumblr partnership: they may want to allow porn, but do they want to be known for allowing porn?
Tumblr also apparently wanted to extend the checkpoint in the mobile environment — or perhaps were required to, by Apple. Many services, especially those spurred or required by Apple to do so, aim to prevent the “accidental porn” situation: if I’m searching for something innocuous, can they prevent a blast of unexpected porn in response to my query? To some degree, the “NSFW” rating and the “safe” setting should handle this, but of course content that a blogger failed (or refused) to flag still slips through. So Tumblr (and other sites) institute a second checkpoint: if the search term might bring back adult content, block all the results for that term. In Tumblr, this is based on tags: bloggers add tags that describe what they’ve posted, and search queries seek matches in those tags.
When you try to choreograph users based on search terms and tags, you’ve doubled your problem. This is not clean, assured data like a self-rating of adult content or the age on a driver’s license. You’re ascertaining what the producer meant when they tagged a post using a certain term, and what the searcher meant when they use the same term as a search query. If I search for the word “gay,” I may be looking for a gay couple celebrating the recent DOMA decision on the steps of the Supreme Court — or “celebrating” bent over the arm of the couch. Very hard for Tumblr to know which I wanted, until I click or complain.
Sometimes these terms line up quite well, either by accident, or on purpose: for instance when users of Instagram indicated pornographic images by tagging them “pornstagram,” a made-up word that would likely mean nothing else. (This search term no longer returns any results, although – whoa! — it does on Tumblr!.) But in just as many cases, when you use the word gay to indicate a photo of your two best friends in a loving embrace, and I use the word gay in my search query to find X-rated pornography, it becomes extremely difficult for the search algorithm to understand what to do about all of those meanings converging on a single word.
Blocking all results to the query “gay,” or “sex”, or even “porn” may seem, form one vantage point (Yahoo’s?), to solve the NSFW problem. Tumblr is not alone in this regard: Vine and Instagram return no results to the search term “sex,” though that does not mean that no one’s using it as a tag – though Instagram returns millions of results for “gay,” Vine, like Tumblr, returns none. Pinterest goes further, using the search for “porn” as a teaching moment: it pops up a reminder that nudity is not permitted on the site, then returns results which, because of the policy, are not pornographic. By blocking search terms/tags, no porn accidentally makes it to the mobile platform or to the eyes of its gentle user. But, this approach fails miserably at getting adult content to those that want it, and more importantly, in Tumblr’s case, it relegates a broadly used and politically vital term like “gay” to the smut pile.
Tumblr’s semi-apology has begun to make amends. The two categories, “NSFW” and “adult” are now just “NSFW” and the blogs masked as such are now available in Tumblr’s internal search and in the major search engines. Tumblr has promised to work on a more intelligent filtering system. But any checkpoint that depends on data that’s expressive rather than systemic — what we say, as opposed to what we say we are — is going to step clumsily both on the sharing of adult content and the ability to talk about subjects that have some sexual connotations, and could architect the spirit and promise out of Tumblr’s publishing platform.
This was originally posted at Culture Digitally.
June 21, 2013 Facebook reported that a bug had potentially exposed 6 million Facebook users’ contact details. While this security breach is a huge at any scale and raises concerns regarding online privacy what I want to bring forward is that it also illuminates how our data is currently used by social media sites. In fact, it is quite interesting that instead of technical description of what happened Facebook wants to tell us why and how it happened:
When people upload their contact lists or address books to Facebook, we try to match that data with the contact information of other people on Facebook in order to generate friend recommendations. For example, we don’t want to recommend that people invite contacts to join Facebook if those contacts are already on Facebook; instead, we want to recommend that they invite those contacts to be their friends on Facebook.
Because of the bug, some of the information used to make friend recommendations and reduce the number of invitations we send was inadvertently stored in association with people’s contact information as part of their account on Facebook. As a result, if a person went to download an archive of their Facebook account through our Download Your Information (DYI) tool, they may have been provided with additional email addresses or telephone numbers for their contacts or people with whom they have some connection. This contact information was provided by other people on Facebook and was not necessarily accurate, but was inadvertently included with the contacts of the person using the DYI tool.
The point I want to focus on here is that in response to the security breach Facebook gives us a rather rare view of how they use user information to establish and maintain user engagement. What is important in this regard is the notion that users’ ‘contact lists’ and ‘address books’ are not only stored to the server but also actively used by Facebook to build new connections and establish new attachments. In this very case your contact details are used to make friend recommendations.
According to Mark Coté and Jennifer Pybus (2007, 101) social networks have an inbuilt “architecture of participation.” This architecture invites users to use the site and then exploits the data user submits to intensify the personalized user experiences. Friend recommendation system is without a doubt a part of these architectures. It is based on the idea that you do not connect with random people but with the people you know. You do not need to search for these people, Facebook suggests them for you with its algorithmic procedures (Bucher 2012). Your real life acquaintances become your Friends on Facebook and you do not have to leave the site to maintain these relationships.
To paraphrase José van Dijck (2013, 12 n9) social media sites engineer our sociality: in other words social media sites are “trying to exert influence on or directing user behavior.” Engineering of sociality needs not to refer to political propaganda or ideological brainwash but can as well be interpreted as technology of keeping users engaged with social media sites. Facebook of course needs user engagement in order to remain productive and to be credible for its shareholders. To be clear, user engagement here is not only emotional or psychological relation to a social media site but a relation that is in extensive manner coded and programmed to the technical and social uses of the platform itself. As such it needs to be researched from views that take into account both human and non-human agencies.
In short, being engaged with social media is a relation of connecting and sharing, discovering and learning, expressing oneself. These architectures of participation work in a circular logic. The more information you provide to social media sites, either explicitly or implicitly (see Schäfer 2009), the more engaged you become. Not only because these sites are able to better place you to a demographic slot based on big data but also because they use the small data, your private data, to personalize the experience. Eventually, you are so engaged that things like compromising the privacy of 6 million users does not stop you from using these sites.
Bucher, Taina 2012. “The Friendship Assemblage: Investigating Programmed Sociality on Facebook.” Television & New Media.Published Online August 24.
Coté, Mark & Pybus, Jennifer 2007. “Learning to Immaterial Labour 2.0: MySpace and Social Networks.” Ephemera, Vol 7(1): 88-106.
Schäfer, Mirko Tobias 2011. Bastard Culture! How User Participation Transforms Cultural Production. Amsterdam: Amsterdam University Press.
Van Dijck, José 2013. The Culture of Connectivity: A Critical History of Social Media. Oxford & New York: Oxford University Press.
401 Access Denied , 403 Forbidden , 404 Not Found , 500 Internal Server Error & the Firehose
There is this thing called the firehose. I’ve witnessed mathematicians, game theorists, computer scientist and engineers (apparently there is a distinction), economists, business scholars, and social scientist salivate over it (myself included). The Firehouse, though technically reserved for the twitter API, is all encompassing in the realm of social science for the streams of data that come from social networking sites that are so large that they cannot be processed as they come in. The data are so large, in fact, that coding requires multiple levels of computer aided refinement, as though when we take data from these sources we are drinking from a firehose. While I cannot find the etymology of where the term came from, it seems it either came from twitter terminology bleed, or a water fountain at MIT.
I am blessed with an advisor who has become the little voice that I always have at the back of my head when I am thinking about something. Every meeting he asks the same question, one that should be easy to answer but almost never is, especially when we are invested in a topic, “why does this matter?” To date, outside of business uses or artistic exploration we’ve not made a good case for why big data matters. I think we all want it because we think some hidden truth might be within it. We fetishize big data, and the Firehouse that exists behind locked doors, as though it will be the answer to some bigger question. The problem with this is, there is no question. We, from our own unique, biased, and disciplinary homes, have to come up with the bigger questions. We also have to accept that while data might provide us with some answers, perhaps we should be asking questions that go deeper than that in a research practice that requires more reflexivity than we are seeing right now. I would love to see more nuanced readings that acknowledge the biases, gaps, and holes at all levels of big data curation.
Predictive Power of Patterns
One of my favorite anecdotes that shows the power of big data is the Target incident from February 2012. Target predicted a teenage girl was pregnant and acted as such before she told her family. They sent baby centric coupons to her. Her father called Target very angry then called back later to apologize because there were some things his daughter hadn’t told him. The media storm following the event painted a world both in awe and creeped out by Targets predictive power. How could a seemingly random bit of shopping history point to a pattern that showed that a customer was pregnant? How come I hadn’t noticed that they were doing this to me too? Since the incident went public, and Target shared how they learned how to hide the targeted ads and coupons to minimize the creepy factor I’ve enjoyed receiving the Target coupon books that always come in pairs to my home, one for me and one for my husband, that look the same on the surface but have slight variations on the inside. Apparently target has learned that it the coupons for me go to him they will be used. This is because every time I get my coupon books I complain to him about my crappy coupon for something I need. He laughs at me and shows me his coupon, usually worth twice as much as mine if I just spend a little bit more. It almost always works.
In 2004 Lou Agosta wrote a piece titled “The Future of Data Mining- Predictive Analytics”. With the proliferation of social media, API data access, and the beloved yet mysterious firehose, I think we can say the future is now. Our belief and cyclical relationship with progress as a universal future inevitability turns big data into a universal good. While I am not denying the usefulness of finding predictive patterns, clearly Target knew the girl was pregnant and was able to capitalize on that knowledge, for the social scientist, this pattern identification for outcome prediction followed by verification should not be enough. Part of our fetishization of big data seems to be in the idea that somehow it will allow us to not just anticipate, but to know, the future. Researchers across fields and industries are working on ways to extract meaningful, predictive data from these nearly indigestible datastreams. We have to remember that even in big data there are gaps, holes, and disturbances. Rather than looking at what big data can tell us, we should be looking towards it as an exploratory method that can help us define different problem sets and related questions.
Big Data as Method?
Recently I went to a talk by a pair of computer scientists. There were people speaking who had access to the entire database of Wikipedia. Because they could, they decided to visualize Wikipedia. After going through slide after slide of pretty colors, they said “who knew there were rainbows in Wikipedia!?”, and then announced that they had moved on from that research. Rainbows can only get me so far. I was stuck asking why this pattern kept repeating itself and wanting to know how people who were creating the data that turned into a rainbow imagined what they were producing. The visualizations didn’t answer anything. If anything, they allowed me to ask clearer, more directed questions. This isn’t to say the work that they did wasn’t beautiful. It is and was. But there is so much more work to do. I hope that as big data continues to become something of a social norm that more people begin to speak across the lines so that we learn how to use this data in meaningful ways everywhere. Right now I think that visualization is still central, but that is one of my biases. The reason I think this is the case because it allows for simple identification of patterns. It also allows us to take in petabytes of data at once, compare different datasets (if similar visualization methods are used) and, to experiment in a way that other forms of data representation do not. When people share visualizations they either show their understandable failure or the final polished product meant for mass consumption. I’ve not heard a lot of conversation about using big data, its curation, and visualization generation as/and method, but maybe I’m not in the right circles? Still, I think until we are willing to share the various steps along the way to turning big data into meaningful bits, or we create an easy to use toolkit for the next generation of big data visualizations, we will continue to all be hacking at the same problem, ending and stopping at different points, without coming to a meaningful point other than “isn’t big data beautiful?”