Big Data, Context Cultures

The latest issue of Media, Culture, and Society features an open-access discussion section responding to SMC all-stars danah boyd and Kate Crawford‘s “Critical Questions for Big Data.” Though the article is only a few years old, it’s been very influential and a lot has happened since it came out, so editors Aswin Punathambekar and Anastasia Kavada commissioned a few responses from scholars to delve deeper into danah and Kate’s original provocations.

The section features pieces by Anita Chan on big data and inclusion, André Brock on “deeper data,” Jack Qiu on access and ethics, Zizi Papacharissi on digital orality, and one by me, Nick Seaver, on varying understandings of “context” among critics and practitioners of big data. All of those, plus an introduction from the editors, are open-access, so download away!

My piece, titled “The nice thing about context is that everyone has it,” draws on my research into the development of algorithmic music recommenders, which I’m building on during my time with the Social Media Collective this fall. Here’s the abstract:

In their ‘Critical Questions for Big Data’, danah boyd and Kate Crawford warn: ‘Taken out of context, Big Data loses its meaning’. In this short commentary, I contextualize this claim about context. The idea that context is crucial to meaning is shared across a wide range of disciplines, including the field of ‘context-aware’ recommender systems. These personalization systems attempt to take a user’s context into account in order to make better, more useful, more meaningful recommendations. How are we to square boyd and Crawford’s warning with the growth of big data applications that are centrally concerned with something they call ‘context’? I suggest that the importance of context is uncontroversial; the controversy lies in determining what context is. Drawing on the work of cultural and linguistic anthropologists, I argue that context is constructed by the methods used to apprehend it. For the developers of ‘context-aware’ recommender systems, context is typically operationalized as a set of sensor readings associated with a user’s activity. For critics like boyd and Crawford, context is that unquantified remainder that haunts mathematical models, making numbers that appear to be identical actually different from each other. These understandings of context seem to be incompatible, and their variability points to the importance of identifying and studying ‘context cultures’–ways of producing context that vary in goals and techniques, but which agree that context is key to data’s significance. To do otherwise would be to take these contextualizations out of context.

Update on the 2015 SMC PhD Internship season

Hello!
We wanted to post a quick update on the status of the 2015 SMC PhD Internship Program. The application season closed January 31 and we ended up with more than 240 stellar candidates to the program. Thank you for your patience with our application process and please forgive the delays in sending an update.

The SMC was humbled and tickled pink by the quality of the applications that we received for the PhD internship this year. It’s always hard to let go of such a range of incredible work in our midsts and that made it very difficult to reach even a short list applicants to interview, let alone select three final candidates. We have reached out to finalists and are in the thick of finalizing offers. If you are reading this message and have not heard from us, until now, I’m afraid that means that we could not place you with us this year. And, due to the large numbers of applications, we cannot offer reviews of individual applications.

We will announce the 2015 PhD intern recipients in June here on the Social Media Collective blog. The 2016 PhD internship and Postdoc application rounds will open, again, in Fall 2015 with an announcement on the SMC blog.

Please know that this was an extremely competitive pool. You all are doing a LOT of amazing work out there! We very much appreciate the applications, welcome the opportunity to learn about your work, and encourage you to try, again, next year if you fit the criteria. Your applications leave us very excited about the direction of social media scholarship.

We look forward to crossing paths with you at conferences, in journal pages, and online.

Best wishes,

Mary L. Gray (on behalf of the SMC)

New special issue of JOBEM: “Old Against New, or a Coming of Age? Rethinking Broadcasting in an Era of Electronic Media”

A little over a year ago, JOBEM editor Zizi Papacharissi approached me, R. Stuart Geiger (UC Berkeley) and Stacy Blasiola (University of Illinois at Chicago) with the idea of a JOBEM special issue hat would be edited and authored by graduate students. We were excited to accept the invitation and set out for the adventure.

The resulting special issue, titled Old Against New, or a Coming of Age? Rethinking Broadcasting in an Era of Electronic Media, has now been published. We are proud to present this issue that begins a new thread in the longstanding conversation about what it means for media to be ‘‘old’’ and ‘‘new.’’ While this distinction is not one we should take for granted, the articles in this issue all demonstrate how we can strategically approach the intricate intersections and interconnections of different media—both old and new.

We were very impressed by the thoughtful and provocative work graduate students across many disciplinary fields contributed in response to our call. Presenting a wide range of international scholarship from graduate students across many different disciplinary backgrounds, topical literatures, methodological approaches, and theoretical frameworks, this special issue represents an emerging approach to what it means to study broadcasting in an era of electronic media.

The guest-edited issue features the following seven articles, along with our Introduction:

We hope that you’ll find the collection inspiring and productive, and we invite you to share them with others who might enjoy them too!

Last but not least, if you are coming to IR15 in a few weeks, we hope to see you at the similarly named fishbowl on the first day of the conference. This will be an opportunity to take the conversation further, together with the community of Internet researchers!

What does the Facebook experiment teach us?

I’m intrigued by the reaction that has unfolded around the Facebook “emotion contagion” study. (If you aren’t familiar with this, read this primer.) As others have pointed out, the practice of A/B testing content is quite common. And Facebook has a long history of experimenting on how it can influence people’s attitudes and practices, even in the realm of research. An earlier study showed that Facebook decisions could shape voters’ practices. But why is it that *this* study has sparked a firestorm?

In asking people about this, I’ve been given two dominant reasons:

  1. People’s emotional well-being is sacred.
  2. Research is different than marketing practices.

I don’t find either of these responses satisfying.

The Consequences of Facebook’s Experiment

Facebook’s research team is not truly independent of product. They have a license to do research and publish it, provided that it contributes to the positive development of the company. If Facebook knew that this research would spark the negative PR backlash, they never would’ve allowed it to go forward or be published. I can only imagine the ugliness of the fight inside the company now, but I’m confident that PR is demanding silence from researchers.

I do believe that the research was intended to be helpful to Facebook. So what was the intended positive contribution of this study? I get the sense from Adam Kramer’s comments that the goal was to determine if content sentiment could affect people’s emotional response after being on Facebook. In other words, given that Facebook wants to keep people on Facebook, if people came away from Facebook feeling sadder, presumably they’d not want to come back to Facebook again. Thus, it’s in Facebook’s better interest to leave people feeling happier. And this study suggests that the sentiment of the content influences this. This suggests that one applied take-away for product is to downplay negative content. Presumably this is better for users and better for Facebook.

We can debate all day long as to whether or not this is what that study actually shows, but let’s work with this for a second. Let’s say that pre-study Facebook showed 1 negative post for every 3 positive and now, because of this study, Facebook shows 1 negative post for every 10 positive ones. If that’s the case, was the one week treatment worth the outcome for longer term content exposure? Who gets to make that decision?

Folks keep talking about all of the potential harm that could’ve happened by the study – the possibility of suicides, the mental health consequences. But what about the potential harm of negative content on Facebook more generally? Even if we believe that there were subtle negative costs to those who received the treatment, the ongoing costs of negative content on Facebook every week other than that 1 week experiment must be more costly. How then do we account for positive benefits to users if Facebook increased positive treatments en masse as a result of this study? Of course, the problem is that Facebook is a black box. We don’t know what they did with this study. The only thing we know is what is published in PNAS and that ain’t much.

Of course, if Facebook did make the content that users see more positive, should we simply be happy? What would it mean that you’re more likely to see announcements from your friends when they are celebrating a new child or a fun night on the town, but less likely to see their posts when they’re offering depressive missives or angsting over a relationship in shambles? If Alice is happier when she is oblivious to Bob’s pain because Facebook chooses to keep that from her, are we willing to sacrifice Bob’s need for support and validation? This is a hard ethical choice at the crux of any decision of what content to show when you’re making choices. And the reality is that Facebook is making these choices every day without oversight, transparency, or informed consent.

Algorithmic Manipulation of Attention and Emotions

Facebook actively alters the content you see. Most people focus on the practice of marketing, but most of what Facebook’s algorithms do involve curating content to provide you with what they think you want to see. Facebook algorithmically determines which of your friends’ posts you see. They don’t do this for marketing reasons. They do this because they want you to want to come back to the site day after day. They want you to be happy. They don’t want you to be overwhelmed. Their everyday algorithms are meant to manipulate your emotions. What factors go into this? We don’t know.

Facebook is not alone in algorithmically predicting what content you wish to see. Any recommendation system or curatorial system is prioritizing some content over others. But let’s compare what we glean from this study with standard practice. Most sites, from major news media to social media, have some algorithm that shows you the content that people click on the most. This is what drives media entities to produce listicals, flashy headlines, and car crash news stories. What do you think garners more traffic – a detailed analysis of what’s happening in Syria or 29 pictures of the cutest members of the animal kingdom? Part of what media learned long ago is that fear and salacious gossip sell papers. 4chan taught us that grotesque imagery and cute kittens work too. What this means online is that stories about child abductions, dangerous islands filled with snakes, and celebrity sex tape scandals are often the most clicked on, retweeted, favorited, etc. So an entire industry has emerged to produce crappy click bait content under the banner of “news.”

Guess what? When people are surrounded by fear-mongering news media, they get anxious. They fear the wrong things. Moral panics emerge. And yet, we as a society believe that it’s totally acceptable for news media – and its click bait brethren – to manipulate people’s emotions through the headlines they produce and the content they cover. And we generally accept that algorithmic curators are perfectly well within their right to prioritize that heavily clicked content over others, regardless of the psychological toll on individuals or the society. What makes their practice different? (Other than the fact that the media wouldn’t hold itself accountable for its own manipulative practices…)

Somehow, shrugging our shoulders and saying that we promoted content because it was popular is acceptable because those actors don’t voice that their intention is to manipulate your emotions so that you keep viewing their reporting and advertisements. And it’s also acceptable to manipulate people for advertising because that’s just business. But when researchers admit that they’re trying to learn if they can manipulate people’s emotions, they’re shunned. What this suggests is that the practice is acceptable, but admitting the intention and being transparent about the process is not.

But Research is Different!!

As this debate has unfolded, whenever people point out that these business practices are commonplace, folks respond by highlighting that research or science is different. What unfolds is a high-browed notion about the purity of research and its exclusive claims on ethical standards.

Do I think that we need to have a serious conversation about informed consent? Absolutely. Do I think that we need to have a serious conversation about the ethical decisions companies make with user data? Absolutely. But I do not believe that this conversation should ever apply just to that which is categorized under “research.” Nor do I believe that academe is necessarily providing a golden standard.

Academe has many problems that need to be accounted for. Researchers are incentivized to figure out how to get through IRBs rather than to think critically and collectively about the ethics of their research protocols. IRBs are incentivized to protect the university rather than truly work out an ethical framework for these issues. Journals relish corporate datasets even when replicability is impossible. And for that matter, even in a post-paper era, journals have ridiculous word count limits that demotivate researchers from spelling out all of the gory details of their methods. But there are also broader structural issues. Academe is so stupidly competitive and peer review is so much of a game that researchers have little incentive to share their studies-in-progress with their peers for true feedback and critique. And the status games of academe reward those who get access to private coffers of data while prompting those who don’t to chastise those who do. And there’s generally no incentive for corporates to play nice with researchers unless it helps their prestige, hiring opportunities, or product.

IRBs are an abysmal mechanism for actually accounting for ethics in research. By and large, they’re structured to make certain that the university will not be liable. Ethics aren’t a checklist. Nor are they a universal. Navigating ethics involves a process of working through the benefits and costs of a research act and making a conscientious decision about how to move forward. Reasonable people differ on what they think is ethical. And disciplines have different standards for how to navigate ethics. But we’ve trained an entire generation of scholars that ethics equals “that which gets past the IRB” which is a travesty. We need researchers to systematically think about how their practices alter the world in ways that benefit and harm people. We need ethics to not just be tacked on, but to be an integral part of how *everyone* thinks about what they study, build, and do.

There’s a lot of research that has serious consequences on the people who are part of the study. I think about the work that some of my colleagues do with child victims of sexual abuse. Getting children to talk about these awful experiences can be quite psychologically tolling. Yet, better understanding what they experienced has huge benefits for society. So we make our trade-offs and we do research that can have consequences. But what warms my heart is how my colleagues work hard to help those children by providing counseling immediately following the interview (and, in some cases, follow-up counseling). They think long and hard about each question they ask, and how they go about asking it. And yet most IRBs wouldn’t let them do this work because no university wants to touch anything that involves kids and sexual abuse. Doing research involves trade-offs and finding an ethical path forward requires effort and risk.

It’s far too easy to say “informed consent” and then not take responsibility for the costs of the research process, just as it’s far too easy to point to an IRB as proof of ethical thought. For any study that involves manipulation – common in economics, psychology, and other social science disciplines – people are only so informed about what they’re getting themselves into. You may think that you know what you’re consenting to, but do you? And then there are studies like discrimination audit studies in which we purposefully don’t inform people that they’re part of a study. So what are the right trade-offs? When is it OK to eschew consent altogether? What does it mean to truly be informed? When it being informed not enough? These aren’t easy questions and there aren’t easy answers.

I’m not necessarily saying that Facebook made the right trade-offs with this study, but I think that the scholarly reaction of research is only acceptable with IRB plus informed consent is disingenuous. Of course, a huge part of what’s at stake has to do with the fact that what counts as a contract legally is not the same as consent. Most people haven’t consented to all of Facebook’s terms of service. They’ve agreed to a contract because they feel as though they have no other choice. And this really upsets people.

A Different Theory

The more I read people’s reactions to this study, the more that I’ve started to think that the outrage has nothing to do with the study at all. There is a growing amount of negative sentiment towards Facebook and other companies that collect and use data about people. In short, there’s anger at the practice of big data. This paper provided ammunition for people’s anger because it’s so hard to talk about harm in the abstract.

For better or worse, people imagine that Facebook is offered by a benevolent dictator, that the site is there to enable people to better connect with others. In some senses, this is true. But Facebook is also a company. And a public company for that matter. It has to find ways to become more profitable with each passing quarter. This means that it designs its algorithms not just to market to you directly but to convince you to keep coming back over and over again. People have an abstract notion of how that operates, but they don’t really know, or even want to know. They just want the hot dog to taste good. Whether it’s couched as research or operations, people don’t want to think that they’re being manipulated. So when they find out what soylent green is made of, they’re outraged. This study isn’t really what’s at stake. What’s at stake is the underlying dynamic of how Facebook runs its business, operates its system, and makes decisions that have nothing to do with how its users want Facebook to operate. It’s not about research. It’s a question of power.

I get the anger. I personally loathe Facebook and I have for a long time, even as I appreciate and study its importance in people’s lives. But on a personal level, I hate the fact that Facebook thinks it’s better than me at deciding which of my friends’ posts I should see. I hate that I have no meaningful mechanism of control on the site. And I am painfully aware of how my sporadic use of the site has confused their algorithms so much that what I see in my newsfeed is complete garbage. And I resent the fact that because I barely use the site, the only way that I could actually get a message out to friends is to pay to have it posted. My minimal use has made me an algorithmic pariah and if I weren’t technologically savvy enough to know better, I would feel as though I’ve been shunned by my friends rather than simply deemed unworthy by an algorithm. I also refuse to play the game to make myself look good before the altar of the algorithm. And every time I’m forced to deal with Facebook, I can’t help but resent its manipulations.

There’s also a lot that I dislike about the company and its practices. At the same time, I’m glad that they’ve started working with researchers and started publishing their findings. I think that we need more transparency in the algorithmic work done by these kinds of systems and their willingness to publish has been one of the few ways that we’ve gleaned insight into what’s going on. Of course, I also suspect that the angry reaction from this study will prompt them to clamp down on allowing researchers to be remotely public. My gut says that they will naively respond to this situation as though the practice of research is what makes them vulnerable rather than their practices as a company as a whole. Beyond what this means for researchers, I’m concerned about what increased silence will mean for a public who has no clue of what’s being done with their data, who will think that no new report of terrible misdeeds means that Facebook has stopped manipulating data.

Information companies aren’t the same as pharmaceuticals. They don’t need to do clinical trials before they put a product on the market. They can psychologically manipulate their users all they want without being remotely public about exactly what they’re doing. And as the public, we can only guess what the black box is doing.

There’s a lot that needs reformed here. We need to figure out how to have a meaningful conversation about corporate ethics, regardless of whether it’s couched as research or not. But it’s not so simple as saying that a lack of a corporate IRB or a lack of a golden standard “informed consent” means that a practice is unethical. Almost all manipulations that take place by these companies occur without either one of these. And they go unchecked because they aren’t published or public.

Ethical oversight isn’t easy and I don’t have a quick and dirty solution to how it should be implemented. But I do have a few ideas. For starters, I’d like to see any company that manipulates user data create an ethics board. Not an IRB that approves research studies, but an ethics board that has visibility into all proprietary algorithms that could affect users. For public companies, this could be done through the ethics committee of the Board of Directors. But rather than simply consisting of board members, I think that it should consist of scholars and users. I also think that there needs to be a mechanism for whistleblowing regarding ethics from within companies because I’ve found that many employees of companies like Facebook are quite concerned by certain algorithmic decisions, but feel as though there’s no path to responsibly report concerns without going fully public. This wouldn’t solve all of the problems, nor am I convinced that most companies would do so voluntarily, but it is certainly something to consider. More than anything, I want to see users have the ability to meaningfully influence what’s being done with their data and I’d love to see a way for their voices to be represented in these processes.

I’m glad that this study has prompted an intense debate among scholars and the public, but I fear that it’s turned into a simplistic attack on Facebook over this particular study rather than a nuanced debate over how we create meaningful ethical oversight in research and practice. The lines between research and practice are always blurred and information companies like Facebook make this increasingly salient. No one benefits by drawing lines in the sand. We need to address the problem more holistically. And, in the meantime, we need to hold companies accountable for how they manipulate people across the board, regardless of whether or not it’s couched as research. If we focus too much on this study, we’ll lose track of the broader issues at stake.

MSR Social Media Collective 2014 PhD internships now open

* APPLICATION DEADLINE: JANUARY 31, 2014 *

Microsoft Research New England (MSRNE) is looking for PhD interns to join the Social Media Collective for Summer 2014. We are looking primarily for social science/humanities PhD students (including communication, sociology, anthropology, media studies, information studies, science and technology studies, etc.). The Social Media Collective is a collection of scholars at MSRNE who focus on socio-technical questions. We are not an applied program; rather, we work on critical research questions that are important to the future of understanding technology through a social scientific/humanistic lens.

MSRNE internships are 12-week paid internships in Cambridge, Massachusetts. PhD interns are expected to be on-site for the duration of their internship. Primary mentors for this year will be Nancy Baym and Kate Crawford.

PhD interns at MSRNE are expected to devise and execute a research project during their internships. The expected outcome of an internship at MSRNE is a publishable scholarly paper for an academic journal or conference of the intern’s choosing. The goal of the internship is to help the intern advance their own career; interns are strongly encouraged to work towards a publication outcome that will help them on the academic job market. Interns are also expected to collaborate on projects or papers with full-time researchers and visitors, give short presentations, and contribute to the life of the community. While this is not an applied program, MSRNE encourages interdisciplinary collaboration with computer scientists, economists, and mathematicians.

Continue reading “MSR Social Media Collective 2014 PhD internships now open”

Addressing Human Trafficking: Guidelines for Technological Interventions

Two years ago, when I started working on issues related to human trafficking and technology, I was frustrated by how few people recognized the potential of technology to help address the commercial sexual exploitation of children. With the help of a few colleagues at Microsoft Research, I crafted a framework document to think through the intersection of technology and trafficking. After talking with Mark Latonero at USC (who has been writing brilliant reports on technology and human trafficking), I teamed up with folks at MSR Connections and Microsoft’s Digital Crimes Unit to help fund research in this space. Over the last year, I’ve been delighted to watch a rich scholarly community emerge that takes seriously the importance of data for understanding and intervening in human trafficking issues that involve technology.

Meanwhile, to my delight, technologists have started to recognize that they can develop innovative systems to help address human trafficking. NGOs have started working with computer scientists, companies have started working with law enforcement, and the White House has started bringing together technologists, domain experts, and policy makers to imagine how technology can be used to combat human trafficking. The potential of these initiatives tickles me pink.

Watching this unfold, one thing that I struggle with is that there’s often a disconnect between what researchers are learning and what the public thinks is happening vis-a-vis the commercial sexual exploitation of children (CSEC). On too many occasions, I’ve watched well-intentioned technologists approach the space with a naiveté that comes from only knowing about human trafficking through media portrayals. While the portraits that receive widespread attention are important for motivating people to act, understanding the nuance and pitfalls of the space are critical for building interventions that will actually make a difference.

To bridge the gap between technologists and researchers, I worked with a group of phenomenal researchers to produce a simple 4-page fact sheet intended to provide a very basic primer on issues in human trafficking and CSEC that technologists need to know before they build interventions:

How to Responsibly Create Technological Interventions to Address the Domestic Sex Trafficking of Minors

Some of the issues we address include:

  1. Youth often do not self-identify themselves as victims.
  2. “Survival sex” is one aspect of CSEC.
  3. Previous sexual abuse, homelessness, family violence, and foster care may influence youth’s risk of exploitation.
  4. Arresting victims undermines efforts to combat CSEC.
  5. Technologies should help disrupt criminal networks.
  6. Post-identification support should be in place before identification interventions are implemented.
  7. Evaluation, assessment, and accountability are critical for any intervention.
  8. Efforts need to be evidence-based.
  9. The cleanliness of data matters.
  10. Civil liberties are important considerations.

This high-level overview is intended to shed light on some of the most salient misconceptions and provide some key insights that might be useful for those who want to make a difference. By no means does it cover everything that experts know, but it provides some key touchstones that may be useful. It is limited to the issues that are most important for technologists, but those who are working with technologists may also find it to be valuable.

As researchers dedicated to addressing human trafficking and the commercial sexual exploitation of children, we want to make sure that the passion that innovative technologists are bringing to the table is directed in the most helpful ways possible. We hope that what we know can be of use to those who are also looking to end exploitation.

(Flickr image by Martin Gommel)

The Cost of Collaboration for Code and Art

Does collaboration result in higher quality creative works than individuals working alone? Is working in groups better for functional works like code than for creative works like art? Although these questions lie at the heart of conversations about collaborative production on the Internet and peer production, it can be hard to find research settings where you can compare across both individual and group work and across both code and art. We set out to tackle these questions in the context of a very large remixing community.

Remixing in Scratch
Example of a remix in the Scratch online community, and the project it is based off. The orange arrows indicate pieces which were present in the original and reused in the remix

Continue reading “The Cost of Collaboration for Code and Art”

“Socially Mediated Publicness”: an open-access issue of JOBEM

I love being a scholar, but one thing that really depresses me about research is that so much of what scholars produce is rendered inaccessible to so many people who might find it valuable, inspiring, or thought-provoking. This is at the root of what drives my commitment to open-access. When Zizi Papacharissi asked Nancy Baym and I if we’d be willing to guest edit the Journal of Broadcasting & Electronic Media (JOBEM), we agreed under one condition: the issue had to be open-access (OA). Much to our surprise and delight, Taylor and Francis agreed to “test” that strange and peculiar OA phenomenon by allowing us to make this issue OA.

Nancy and I decided to organize the special issue around “socially mediated publicness,” both because we find that topic to be of great interest and because we felt like there was something fun about talking about publicness in truly public form. We weren’t sure what the response to our call would be, but were overwhelmed with phenomenal submissions and had to reject many interesting articles.

But we are completely delighted to publish a collection of articles that we think are timely, interesting, insightful, and downright awesome. If you would like to get a sense of the arguments made in these articles, make sure to check out our introduction. The seven pieces in this guest-edited issue of JOBEM are:

We hope that you’ll find them fun to read and that you’ll share them with others that might enjoy them too!