Skip to content

Recognizing the Work of Reddit’s Moderators: Summer Research Project

June 16, 2015

What does it take to keep online communities going? With over 550,000 public subreddits, many of which are active, the communities on the site rely on ongoing effort by a large number of volunteer moderators. In my research, I’ve made the case that caring for the communities we’re part of is an important kind of digital citizenship. For that reason, I’m excited to learn more from redditors about how they see the work of moderation, why they do it, and what is/isn’t their job.

This spring, I’ve been reading extensively about digital labor and citizenship online, including the story of over 30,000 AOL community leaders who facilitated online communities in the 90s. With Reddit pushing for profitability and promising new policies on online harassment, I thought that potential tensions arising this summer might offer an important lens into the work of moderators, at a time when listening to mods and recognizing their work would be especially important. “The summer is likely to include substantial discussion and introspection on the nature and boundaries of moderation work on Reddit,” I wrote in my proposal mid-May.

Although I expected something, I didn’t expect that Reddit would ban a set of subreddits and mods in their attempt to carry out their new policies, or that some redditors would vigorously oppose this move. (Update July 6: I also didn’t anticipate that reddit moderators would take their subs private to advocate for changes in how they are treated). These controversies have convinced me that this research could be especially valuable right now. Press coverage is likely to focus primarily on the controversy, while I can carry out a summer-long project, in conversation with a wider sample of redditors than just those associated with this controversy.

In this post (which I will be sharing with redditors when I ask permission to speak with them) I outline my research to understand how Reddit’s moderators see and define what they do. This blog post includes details of the research, the promises I make to redditors, and the wider reasons for this project.

About This Research Project

I’m a PhD student at the MIT Media Lab / Center for Civic Media and a fellow at Harvard’s Berkman Center for Internet and Society where I research civic life online. As a PhD intern at Microsoft Research, I get to be supported this summer by amazing researchers including Tarleton Gillespie, Mary Gray, and Nancy Baym, who are advising this project.To learn more about my work you can read my MIT blog or check out my portfolio.

Over the next few months, I’ll be:

  • Hanging out in moderator subreddits like needamod, modhelp, and others to learn more about how mods find opportunities, learn the ropes, and discuss their work
  • Posting questions to some subreddits, after seeking permission from the mods, asking questions or getting feedback on my working understandings
  • Collecting basic summary statistics across Reddit, from public information, to understand, on average, how many mods there are (like the above chart) and what kinds of rules different subreddits have.
  • (potentially) interviewing reddit mods
  • (potentially) trying my hand as a moderator

Ethics: Who’s This For, What am I Recording, and What am I Sharing?

My summer project is being done at Microsoft Research’s Social Media Collective, where I am a PhD intern. At MSR, I have the intellectual freedom to ask questions that are widely important to society and scholarship. I also expect to make my research widely accessible. Microsoft open-sourced my code when I was an intern in 2013, and Microsoft Research has an open access policy for its research.

Although I am a fellow at the DERP Institute and can, in theory, start a conversation with Reddit employees, I have not discussed this project with Reddit at all, have never received compensation from Reddit, nor am I working for them in any way. While it is possible that I may be asked in the future to share my results with the company, I will not share any of my notes or data with Reddit beyond the findings that I publish in research papers, public talks, blog posts, or open source materials.

This isn’t the first time I’ve done research about the work of moderation from the outside a powerful company. Last month, my colleagues and I published a report on Reporting, Reviewing, and Responding to Harassment on Twitter, including a section on the work of moderating alleged harassment. In that study, we treated everyone in our study with respect, including alleged harassers. Our research team did not share data with the company, we were writing independently of Twitter, and we had full editorial control over our report, even from the commissioning organization WAM!. Likewise, in my 2013 summer research at Microsoft on local community blogging, we either summarized or anonymized/modified all quotes and photos before publishing our results.

In this project, I promise that:

  • Anyone can opt out of this research at any time by contacting me at /user/natematias. If you opt out, I will avoid quoting or mentioning you in any way in the published results.
  • By default, I will anonymize any information I collect before publishing
  • If a user requests that I use their username to give them appropriate credit for their work, I’ll weigh the risk/benefits and try to do right by the user
  • I will keep all my notes and data secured, with secure backups that I access through encrypted connections.

Why Do Research with Redditors?

Reddit is one of the few major public platforms on the English-language web that allows/expects its users to establish and maintain their own communities, without thousands of paid content moderators and algorithms behind the scenes deciding what to keep or delete. In contrast, the Huffington Post pre-moderates 450,000 comments per day, paying between $0.005 and $0.25 for every comment that comes in. Yet Reddit mods do so much more than just delete spam. They do a huge amount of important work to create new communities, recruit participants, post content, manage subreddit settings & style, recruit new moderators, set rules for their subreddit, and monitor/manage submissions and comments. Moderators also tend to play a large role in debating and establishing wider community norms like Rediquette.

Last week, I used the Reddit API to collect data on the number of moderators who keep subreddit’s conversations going. A random sample of 100,615 subreddits (roughly 1/6 of all public subreddits) had 91,563 user accounts as moderators. While not all of these subreddits are active, each of them represents a moment of interest to try on the role. Among the 46% of subreddits with more than one subscriber, 30% of these subreddits have two or more moderators.

Communities of redditors and mods shaped some of my earliest impressions of the site six years ago, when a work colleague in invited me to join Reddit London meetups, telling me stories about their weekend and after-work gatherings. It was clear that participation meant more to many redditors than just links and comments. Later on, when I took two years to facilitate @1book140, The Atlantic’s Twitter book club, with around 140,000 subscribers, I came to learn how challenging and rewarding it can be to support a large discussion group online.

How Am I Going to Go About This Research?

Computer scientists, economists, and designers often want to ask if offering the right upvoting system or the right set of badges will filter content effectively or motivate people to contribute the greatest amount of appropriate effort to a web platform. This focus on productivity often interprets the activity of users in the language of company priorities rather than community ones. Stuart Geiger and I discussed this idea of productivity last fall at HCOMP Citizen-X, arguing that we need to understand user’s values beyond just the “productivity” of a group.

Although I often explore questions through design and data analysis, I’m taking a different approach this summer, to better understand how redditors see their own participation. My first semester at MIT taught me how important it can be to participate and observe a community rather than just measure it. Rather than spend the whole summer data-mining the Reddit API, I’m participating in subreddits and speaking to redditors. In “The Logic and Aims of Qualitative Research,” a chapter in a larger collection on communications research methods, Christians and Carey say that when researchers ask questions about human life, “we are examining a creative process whereby people produce and maintain forms of life and society and and systems of meaning and value.” They argue that qualitative research sets out to “better understand the meanings that people use to guide their activities” (358-9).

As a student in MIT’s Technologies for Creative Learning class, I was curious about how young people learning to code thought about “bugs” in the stories, art, and games they made with Scratch. In a corporate environment, where there’s a goal for everyone’s work, it’s possible to define software errors. But does the same language apply to a ten-year-old child who’s creating a story after school? Most scholarly discussion of “bugs” applied this corporate term to young people, defining strict goals for students and measuring “errors” when they diverged from pre-defined projects. When I visited schools, observed student projects, and talked to students, I saw that diverging from the teacher’s plan could be a highly creative act. Far from an error, a “glitch” could prompt new creative directions, and an “unexpected surprise” often opened learners to new understandings about code.

Code and artwork from one of my first projects on Scratch.

If I had relied entirely on the definitions and data coming from teachers or the Scratch platform, I might have been able to test statistical hypotheses about “bugs,” and I might even have developed ways to limit the number of errors per student. I would never have noticed how important these unexpected surprises were to young people’s creativity, and at worst, I might even have reduced the chances of students to experience them. By participating with students and spending time in their learning environment, I was able find new language, like “glitch,” that might move conversations beyond “errors” or “bugs.”

For my Reddit study this summer, I want to hear directly from mods about how they see their work; questions that go well beyond what can be measured. Many thanks in advance to those who welcome me into your subreddits this summer and take time to talk with me.

Update July 6, 2015. I ran into a Reddit employee at a conference last week and sent them this link, so the company is now aware of this project. I am still not working directly with Reddit in any way.

Discourse Matters: Designing better digital futures

June 5, 2015

A very similar version of this blog post originally appeared in Culture Digitally on June 5, 2015.

Words Matter. As I write this in June 2015, a United Nations committee in Bonn is occupied in the massive task of editing a document overviewing global climate change. The effort to reduce 90 pages into a short(er), sensible, and readable set of facts and positions is not just a matter of editing but a battle among thousands of stakeholders and political interests, dozens of languages, and competing ideas about what is real and therefore, what should or should not be done in response to this reality.


I think about this as I complete a visiting fellowship at Microsoft Research, where over a thousand researchers worldwide study complex world problems and focus on advancing state of the art computing. In such research environments the distance between one’s work and the design of the future can feel quite small. Here, I feel like our everyday conversations and playful interactions on whiteboards has the potential to actually impact what counts as the cutting edge and what might get designed at some future point.

But in less overtly “future making” contexts, our everyday talk still matters, in that words construct meanings, which over time and usage become taken for granted ways of thinking about the way the world works. These habits of thought, writ large, shape and delimit social action, organizations, and institutional structures.

In an era of web 2.0, networked sociality, constant connectivity, smart devices, and the internet of things (IoT), how does everyday talk shape our relationship to technology, or our relationships to each other? If the theory of social construction is really a thing, are we constructing the world we really want? Who gets to decide the shape of our future? More importantly, how does everyday talk construct, feed, or resist larger discourses?

rhetoric as world-making

From a discourse-centered perspective, rhetoric is not a label for politically loaded or bombastic communication practices, but rather, a consideration of how persuasion works. Reaching back to the most classic notions of rhetoric from ancient Greek philosopher Aristotle, persuasion involves a mix of logical, emotional, and ethical appeals, which have no necessary connection to anything that might be sensible, desirable, or good to anyone, much less a majority. Persuasion works whether or not we pay attention. Rhetoric can be a product of deliberation or effort, but it can also function without either.

When we represent the techno-human or socio-technical relation through words, images, these representations function rhetorically. World making is inherently discursive at some level. And if making is about changing, this process inevitably involves some effort to influence how people describe, define, respond to, or interact with/in actual contexts of lived experience.

I have three sisters, each involved as I am in world-making, if such a descriptive phrase can be applied to the everyday acts of inquiry that prompt change in socio-technical contexts. Cathy is an organic gardener who spends considerable time improving techniques for increasing her yield each year.  Louise is a project manager who designs new employee orientation programs for a large IT company. Julie is a biochemist who studies fish in high elevation waterways.

Perhaps they would not describe themselves as researchers, designers, or even makers. They’re busy carrying out their job or avocation. But if I think about what they’re doing from the perspective of world-making, they are all three, plus more. They are researchers, analyzing current phenomena. They are designers, building and testing prototypes for altering future behaviors. They are activists, putting time and energy into making changes that will influence future practices.

Their work is alternately physical and cognitive, applied for distinct purposes, targeted to very different types of stakeholders.  As they go about their everyday work and lives, they are engaged in larger conversations about what matters, what is real, or what should be changed.

Everyday talk is powerful not just because it has remarkable potential to persuade others to think and act differently, but also because it operates in such unremarkable ways. Most of us don’t recognize that we’re shaping social structures when we go about the business of everyday life. Sure, a single person’s actions can become globally notable, but most of the time, any small action such as a butterfly flapping its wings in Michigan is difficult to link to a tsunami halfway around the world. But whether or not direct causality can be identified, there is a tipping point where individual choices become generalized categories. Where a playful word choice becomes a standard term in the OED. Where habitual ways of talking become structured ways of thinking.

The power of discourse: Two examples

I mention two examples that illustrate the power of discourse to shape how we think about social media, our relationship to data, and our role in the larger political economies of internet related activities. These cases are selected because they cut across different domains of digital technological design and development. I develop these cases in more depth here and here.

‘Sharing’ versus ‘surfing’

The case of ‘sharing’ illustrates how a term for describing our use of technology (using, surfing, or sharing) can influence the way we think about the relationship between humans and their data, or the rights and responsibilities of various stakeholders involved in these activities. In this case, regulatory and policy frameworks have shifted the burden of responsibility from governmental or corporate entities to individuals. This may not be directly caused by the rise in the use of the term ‘sharing’ as the primary description of what happens in social media contexts, but this term certainly reinforces a particular framework that defines what happens online. When this term is adopted on a broad scale and taken for granted, it functions invisibly, at deep structures of meaning. It can seem natural to believe that when we decide to share information, we should accept responsibility for our action of sharing it in the first place.

It is easy to accept the burden for protecting our own privacy when we accept the idea that we are ‘sharing’ rather than doing something else. The following comment seems sensible within this structure of meaning: “If you didn’t want your information to be public, you shouldn’t have shared it in the first place.”  This explanation is naturalized, but is not the only way of seeing and describing this event. We could alternately say we place our personal information online like we might place our wallet on the table. When someone else steals it, we’d likely accuse the thief of wrongdoing rather than the innocent victim who trusted that their personal belongings would be safe.

A still different frame might characterize personal information as an extension of the body or even a body part, rather than an object or possession. Within this definition, disconnecting information from the person would be tantamount to cutting off an arm. As with the definition of the wallet above, accountability for the action would likely be placed on the shoulders of the ‘attacker’ rather than the individual who lost a finger or ear.

‘Data’ and quantification of human experience

With the rise of big data, we have entered (or some would say returned to) an era of quantification. Here, the trend is to describe and conceptualize all human activity as data—discrete units of information that can be collected and analyzed. Such discourse collapses and reduces human experience. Dreams are equalized with body weight; personality is something that can be categorized with a similar statistical clarity as diabetes.

The trouble of using data as the baseline unit of information is that it presents an imaginary of experience that is both impoverished and oversimplified. This conceptualization is coincidental, of course, in that it coincides with the focus on computation as the preferred mode of analysis, which is predicated on the ability to collect massive quantities of digital information from multiple sources, which can only be measured through certain tools.

“Data” is a word choice, not an inevitable nomenclature. This choice has consequence from the micro to macro, from the cultural to the ontological. This is the case because we’ve transformed life into arbitrarily defined pieces, which replace the flow of lived experience with information bits. Computational analytics makes calculations based on these information bits. This matters, in that such datafication focuses attention on that which exists as data and ignores what is outside this configuration. Indeed, data has become a frame for that which is beyond argument because it always exists, no matter how it might be interpreted (a point well developed by many including Daniel Rosenberg in his essay Data before the fact).

We can see a possible outcome of such framing in the emerging science and practice of “predictive policing.” This rapidly growing strategy in large metropolitan cities is a powerful example of how computation of tiny variables in huge datasets can link individuals to illegal behaviors. The example grows somewhat terrifying when we realize these algorithms are used to predict what is likely to occur, rather than to simply calculate what has occurred. Such predictions are based on data compiled from local and national databases, focusing attention on only those elements of human behavior that have been captured in these data sets (for more on this, see the work of Sarah Brayne)

We could alternately conceptualize human experience as a river that we can only step in once, because it continually changes as it flows through time-space. In such a Heraclitian characterization, we might then focus more attention on the larger shape and ecology of the river rather than trying to capture the specificities of the moment when we stepped into it.

Likewise, describing behavior in terms of the chemical processes in the brain, or in terms of the encompassing political situation within which it occurs will focus our attention on different aspects of an individual’s behavior or the larger situation to which or within which this behavior responds. Each alternative discourse provokes different ways of seeing and making sense of a situation.

When we stop to think about it, we know these symbolic interactions matter. Gareth Morgan’s classic work about metaphors of organization emphasizes how the frames we use will generate distinctive perspectives and more importantly, distinctive structures for organizing social and workplace activities.  We might reverse engineer these structures to find a clash of rivaling symbols, only some of which survive to define the moment and create future history. Rhetorical theorist Kenneth Burke would talk about these symbolic frames as myths. In a 1935 speech to the American Writer’s Congress he notes that:

“myth” is the social tool for welding the sense of interrelationship by which [we] can work together for common social ends. In this sense, a myth that works well is as real as food, tools, and shelter are.

These myths do not just function ideologically in the present tense. As they are embedded in our everyday ways of thinking, they can become naturalized principles upon which we base models, prototypes, designs, and interfaces.

Designing better discourses

How might we design discourse to try to intervene in the shape of our future worlds? Of course, we can address this question as critical and engaged citizens. We are all researchers and designers involved in the everyday processes of world-making. Each, in our own way, are produsing the ethics that will shape our future.

This is a critical question for interaction and platform designers, software developers, and data scientists. In our academic endeavors, the impact of our efforts may or may not seem consequential on any grand scale. The outcome of our actions may have nothing to do with what we thought or desired from the outset. Surely, the butterfly neither intends nor desires to cause a tsunami.

butterfly effect comic

Image by J. L. Westover

Still, it’s worth thinking about. What impact do we have on the larger world? And should we be paying closer attention to how we’re ‘world-making’ as we engage in the mundane, the banal, the playful? When we consider the long future impact of our knowledge producing practices, or the way that technological experimentation is actualized, the answer is an obvious yes.  As Laura Watts notes in her work on future archeology:

futures are made and fixed in mundane social and material practice: in timetables, in corporate roadmaps, in designers’ drawings, in standards, in advertising, in conversations, in hope and despair, in imaginaries made flesh.

It is one step to notice these social construction processes. The challenge then shifts to one of considering how we might intervene in our own and others’ processes, anticipate future causality, turn a tide that is not yet apparent, and try to impact what we might become.

Acknowledgments and references

Notably, the position I articulate here is not new or unique, but another variation on a long running theme of critical scholarship, which is well represented by members of the Social Media Collective. I am also indebted to a long list of feminist and critical scholarship.  This position statement is based on my recent interests and concerns about social media platform design, the role of self-learning algorithmic logics in digital culture infrastructures, and the ethical gaps emerging from rapid technological development. It derives from my previous work in digital identity, ethnographic inquiry of user interfaces and user perceptions, and recent work training participants to use auto-ethnographic and phenomenology techniques to build reflexive critiques of their lived experience in digital culture. There are, truly, too many sources and references to list here, but as a short list of what I directly mentioned:

Kenneth L. Burke. 1935. Revolutionary symbolism in America. Speech to the American Writer’s Congress, February 1935. Reprinted in The Legacy of Kenneth Burke. Herbert W. Simons and Trevor Melia (eds). Madison: U of Wisconsin Press, 1989. Retrieved 2 June 2015 from:

Annette N. Markham. Forthcoming. From using to sharing: A story of shifting fault lines in privacy and data protection narratives. In Digital Ethics (2nd ed). Baastian Vanaker, Donald Heider (eds). Peter Lang Press, New York. Final draft available in PDF here

Annette N. Markham. 2014. Undermining data: A critical examination of a core term in scientific inquiry. First Monday, 18(10).

Gareth Morgan. 1986. Images of Organization. Sage Publications, Thousand Oaks, CA.

Daniel Rosenberg. 2013. Data before the fact. In Raw data’ is an oxymoron. Lisa Gitelman (ed). Cambridge, Mass.: MIT Press, pp. 15–40.

Laura Watts. 2015. Future archeology: Re-animating innovation in the mobile telecoms industry. In Theories of the mobile internet: Materialities and imaginaries. Andrew Herman, Jan Hadlaw, Thom Swiss (Eds). Routledge Press,

A Research Agenda for Accountable Algorithms

May 12, 2015

What should people who are interested in accountability and algorithms be thinking about? Here is one answer: My eleven-minute remarks are now online from a recent event at NYU. I’ve edited them to intersperse my slides.

This talk was partly motivated by the ethics work being done in the machine learning community. That is very exciting and interesting work and I love, love, love it. My remarks are an attempt to think through the other things we might also need to do. Let me know how to replace the “??” in my slides with something more meaningful!

Preview: My remarks contain a minor attempt at a Michael Jackson joke.



A number of fantastic Social Media Collective people were at this conference — you can hear Kate Crawford in the opening remarks.  For more videos from the conference, see:

Algorithms and Accountability

Thanks to Joris van Hoboken, Helen Nissenbaum and Elana Zeide for organizing such a fab event.

If you bought this 11-minute presentation you might also buy: Auditing Algorithms, a forthcoming workshop at Oxford.



(This was cross-posted to multicast.)

MSR’s Social Media Collective is looking for a 2015-16 Research Assistant (to start 15 July)

May 11, 2015

Microsoft Research (MSR) is looking for a Research Assistant to work with the Social Media Collective in the New England lab, based in Cambridge, Massachusetts. The MSR Social Media Collective consists of Nancy Baym, Sarah Brayne, Kevin Driscoll, Tarleton Gillespie, Mary L. Gray, and Lana Schwartz in Cambridge, and Kate Crawford and danah boyd in New York City, as well as faculty visitors and Ph.D. interns affiliated with the MSR New England. The RA will work directly with Nancy Baym, Kate Crawford, Tarleton Gillespie, and Mary L. Gray. Unfortunately because this is a time-limited contract position, we can only consider candidates who are already legally eligible to work in the United States.

An appropriate candidate will be a self-starter who is passionate and knowledgeable about the social and cultural implications of technology. Strong skills in writing, organization and academic research are essential, as are time-management and multi-tasking. Minimal qualifications are a BA or equivalent degree in a humanities or social science discipline and some qualitative research training.

Job responsibilities will include:
– Sourcing and curating relevant literature and research materials
– Developing literature reviews and/or annotated bibliographies
– Coding ethnographic and interview data
– Copyediting manuscripts
– Working with academic journals on themed sections
– Assisting with research project data management and event organization

The RA will also have opportunities to collaborate on ongoing projects. While publication is not a guarantee, the RA will be encouraged to co-author papers while at MSR. The RAship will require 40 hours per week on site in Cambridge, MA, and remote coordination with New York City-based researchers. It is a 12 month contractor position, with the opportunity to extend the contract an additional 6 months. The position pays hourly with flexible daytime hours. The start date will ideally be July 15, although flexibility is possible for the right candidate.

This position is perfect for emerging scholars planning to apply to PhD programs in Communication, Media Studies, Sociology, Anthropology, Information Studies, and related fields who want to develop their research skills before entering a graduate program. Current New England-based MA/PhD students are welcome to apply provided they can commit to 40 hours of on-site work per week.

To apply, please send an email to Mary Gray ( with the subject “RA Application” and include the following attachments:

– One-page (single-spaced) personal statement, including a description of research experience and training, interests, and professional goals
– CV or resume
– Writing sample (preferably a literature review or a scholarly-styled article)
– Links to online presence (e.g., blog, homepage, Twitter, journalistic endeavors, etc.)
– The names and emails of two recommenders

We will begin reviewing applications on May 15 and will continue to do so until we find an appropriate candidate. We will post to the blog when the position is filled.

Please feel free to ask quesions about the position in the blog comments!

The Facebook “It’s Not Our Fault” Study

May 7, 2015

Today in Science, members of the Facebook data science team released a provocative study about adult Facebook users in the US “who volunteer their ideological affiliation in their profile.” The study “quantified the extent to which individuals encounter comparatively more or less diverse” hard news “while interacting via Facebook’s algorithmically ranked News Feed.”*

  • The research found that the user’s click rate on hard news is affected by the positioning of the content on the page by the filtering algorithm. The same link placed at the top of the feed is about 10-15% more likely to get a click than a link at position #40 (figure S5).
  • The Facebook news feed curation algorithm, “based on many factors,” removes hard news from diverse sources that you are less likely to agree with but it does not remove the hard news that you are likely to agree with (S7). They call news from a source you are less likely to agree with “cross-cutting.”*
  • The study then found that the algorithm filters out 1 in 20 cross-cutting hard news stories that a self-identified conservative sees (or 5%) and 1 in 13 cross-cutting hard news stories that a self-identified liberal sees (8%).
  • Finally, the research then showed that “individuals’ choices about what to consume” further limits their “exposure to cross-cutting content.” Conservatives will click on only 17% a little less than 30% of cross-cutting hard news, while liberals will click 7% a little more than 20% (figure 3).

My interpretation in three sentences:

  1. We would expect that people who are given the choice of what news they want to read will select sources they tend to agree with–more choice leads to more selectivity and polarization in news sources.
  2. Increasing political polarization is normatively a bad thing.
  3. Selectivity and polarization are happening on Facebook, and the news feed curation algorithm acts to modestly accelerate selectivity and polarization.

I think this should not be hugely surprising. For example, what else would a good filter algorithm be doing other than filtering for what it thinks you will like?

But what’s really provocative about this research is the unusual framing. This may go down in history as the “it’s not our fault” study.

Facebook: It’s not our fault.

I carefully wrote the above based on my interpretation of the results. Now that I’ve got that off my chest, let me tell you about how the Facebook data science team interprets these results. To start, my assumption was that news polarization is bad.  But the end of the Facebook study says:

“we do not pass judgment on the normative value of cross-cutting exposure”

This is strange, because there is a wide consensus that exposure to diverse news sources is foundational to democracy. Scholarly research about social media has–almost universally–expressed concern about the dangers of increasing selectivity and polarization. But it may be that you do not want to say that polarization is bad when you have just found that your own product increases it. (Modestly.)

And the sources cited just after this quote sure do say that exposure to diverse news sources is important. But the Facebook authors write:

“though normative scholars often argue that exposure to a diverse ‘marketplace of ideas’ is key to a healthy democracy (25), a number of studies find that exposure to cross-cutting viewpoints is associated with lower levels of political participation (22, 26, 27).”

So the authors present reduced exposure to diverse news as a “could be good, could be bad” but that’s just not fair. It’s just “bad.” There is no gang of political scientists arguing against exposure to diverse news sources.**

The Facebook study says it is important because:

“our work suggests that individuals are exposed to more cross-cutting discourse in social media they would be under the digital reality envisioned by some

Why so defensive? If you look at what is cited here, this quote is saying that this study showed that Facebook is better than a speculative dystopian future.*** Yet the people referred to by this word “some” didn’t provide any sort of point estimates that were meant to allow specific comparisons. On the subject of comparisons, the study goes on to say that:

“we conclusively establish that…individual choices more than algorithms limit exposure to attitude-challenging content.”

compared to algorithmic ranking, individuals’ choices about what to consume had a stronger effect”

Alarm bells are ringing for me. The tobacco industry might once have funded a study that says that smoking is less dangerous than coal mining, but here we have a study about coal miners smoking. Probably while they are in the coal mine. What I mean to say is that there is no scenario in which “user choices” vs. “the algorithm” can be traded off, because they happen together (Fig. 3 [top]). Users select from what the algorithm already filtered for them. It is a sequence.**** I think the proper statement about these two things is that they’re both bad — they both increase polarization and selectivity. As I said above, the algorithm appears to modestly increase the selectivity of users.

The only reason I can think of that the study is framed this way is as a kind of alibi. Facebook is saying: It’s not our fault! You do it too!

Are we the 4%?

In my summary at the top of this post, I wrote that the study was about people “who volunteer their ideological affiliation in their profile.” But the study also describes itself by saying:

“we utilize a large, comprehensive dataset from Facebook.”

“we examined how 10.1 million U.S. Facebook users interact”

These statements may be factually correct but I found them to be misleading. At first, I read this quickly and I took this to mean that out of the at least 200 million Americans who have used Facebook, the researchers selected a “large” sample that was representative of Facebook users, although this would not be representative of the US population. The “limitations” section discusses the demographics of “Facebook’s users,” as would be the normal thing to do if they were sampled. There is no information about the selection procedure in the article itself.

Instead, after reading down in the appendices, I realized that “comprehensive” refers to the survey research concept: “complete,” meaning that this was a non-probability, non-representative sample that included everyone on the Facebook platform. But out of hundreds of millions, we ended up with a study of 10.1m because users were excluded unless they met these four criteria:

  1. “18 or older”
  2. “log in at least 4/7 days per week”
  3. “have interacted with at least one link shared on Facebook that we classified as hard news”
  4. “self-report their ideological affiliation” in a way that was “interpretable”

That #4 is very significant. Who reports their ideological affiliation on their profile?

add your political views

It turns out that only 9% of Facebook users do that. Of those that report an affiliation, only 46% reported an affiliation in a way that was “interpretable.” That means this is a study about the 4% of Facebook users unusual enough to want to tell people their political affiliation on the profile page. That is a rare behavior.

More important than the frequency, though, is the fact that this selection procedure confounds the findings. We would expect that a small minority who publicly identifies an interpretable political orientation to be very likely to behave quite differently than the average person with respect to consuming ideological political news.  The research claims just don’t stand up against the selection procedure.

But the study is at pains to argue that (italics mine):

“we conclusively establish that on average in the context of Facebook, individual choices more than algorithms limit exposure to attitude-challenging content.”

The italicized portion is incorrect because the appendices explain that this is actually a study of a specific, unusual group of Facebook users. The study is designed in such a way that the selection for inclusion in the study is related to the results. (“Conclusively” therefore also feels out of place.)

Algorithmium: A Natural Element?

Last year there was a tremendous controversy about Facebook’s manipulation of the news feed for research. In the fracas it was revealed by one of the controversial study’s co-authors that based on the feedback received after the event, many people didn’t realize that the Facebook news feed was filtered at all. We also recently presented research with similar findings.

I mention this because when the study states it is about selection of content, who does the selection is important. There is no sense in this study that a user who chooses something is fundamentally different from the algorithm hiding something from them. While in fact the the filtering algorithm is driven by user choices (among other things), users don’t understand the relationship that their choices have to the outcome.

not sure if i hate facebook or everyone i know
In other words, the article’s strange comparison between “individual’s choices” and “the algorithm,” should be read as “things I choose to do” vs. the effect of “a process Facebook has designed without my knowledge or understanding.” Again, they can’t be compared in the way the article proposes because they aren’t equivalent.

I struggled with the framing of the article because the research talks about “the algorithm” as though it were an element of nature, or a naturally occurring process like convection or mitosis. There is also no sense that it changes over time or that it could be changed intentionally to support a different scenario.*****

Facebook is a private corporation with a terrible public relations problem. It is periodically rated one of the least popular companies in existence. It is currently facing serious government investigations into illegal practices in many countries, some of which stem from the manipulation of its news feed algorithm. In this context, I have to say that it doesn’t seem wise for these Facebook researchers to have spun these data so hard in this direction, which I would summarize as: the algorithm is less selective and less polarizing. Particularly when the research finding in their own study is actually that the Facebook algorithm is modestly more selective and more polarizing than living your life without it.

Update: (6pm Eastern)

Wow, if you think I was critical have a look at these. It turns out I am the moderate one.

Eszter Hargittai from Northwestern posted on Crooked Timber that we should “stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously.” And (my favorite): “I thought Science was a serious peer-reviewed publication.”

Nathan Jurgenson from Maryland and Snapchat wrote on Cyborgology (“in a fury“) that Facebook is intentionally “evading” its own role in the production of the news feed. “Facebook cannot take its own role in news seriously.” He accuses the authors of using the “Big-N trick” to intentionally distract from methodological shortcomings. He tweeted that “we need to discuss how very poor corporate big data research gets fast tracked into being published.”

Zeynep Tufekci from UNC wrote on Medium that “I cannot remember a worse apples to oranges comparison” and that the key take-away from the study is actually the ordering effects of the algorithm (which I did not address in this post). “Newsfeed placement is a profoundly powerful gatekeeper for click-through rates.”

Update: (5/10)

A comment helpfully pointed out that I used the wrong percentages in my fourth point when summarizing the piece. Fixed it, with changes marked.

Update: (5/15)

It’s now one week since the Science study. This post has now been cited/linked in The New York Times, Fortune, Time, Wired, Ars Technica, Fast Company, Engaget, and maybe even a few more. I am still getting emails. The conversation has fixated on the <4% sample, often saying something like: "So, Facebook said this was a study about cars, but it was actually only about blue cars.” That’s fine, but the other point in my post is about what is being claimed at all, no matter the sample.

I thought my “coal mine” metaphor about the algorithm would work but it has not always worked. So I’ve clamped my Webcam to my desk lamp and recorded a four-minute video to explain it again, this time with a drawing.******

If the coal mine metaphor failed me, what would be a better metaphor? I’m not sure. Suggestions?




* Diversity in hard news, in their study, would be a self-identified liberal who receives a story from, or a self-identified conservative who receives one from the, where the stories are about “national news, politics, [or] world affairs.” In more precise terms, for each user “cross-cutting content” was defined as stories that are more likely to be shared by partisans who do not have the same self-identified ideological affiliation that you do.

** I don’t want to make this even more nitpicky, so I’ll put this in a footnote. The paper’s citations to Mutz and Huckfeldt et al. to mean that “exposure to cross-cutting viewpoints is associated with lower levels of political participation” is just bizarre. I hope it is a typo. These authors don’t advocate against exposure to cross-cutting viewpoints.

*** Perhaps this could be a new Facebook motto used in advertising: “Facebook: Better than one speculative dystopian future!”

**** In fact, algorithm and user form a coupled system of at least two feedback loops. But that’s not helpful to measure “amount” in the way the study wants to, so I’ll just tuck it away down here.

***** Facebook is behind the algorithm but they are trying to peer-review research about it without disclosing how it works — which is a key part of the study. There is also no way to reproduce the research (or do a second study on a primary phenomenon under study, the algorithm) without access to the Facebook platform.

****** In this video, I intentionally conflate (1) the number of posts filtered and (2) the magnitude of the bias of the filtering. I did so because the difficulty with the comparison works the same way for both, and I was trying to make the example simpler. Thanks to Cedric Langbort for pointing out that “baseline error” is the clearest way of explaining this.

(This was cross-posted to multicast and Wired.)

A very exciting announcement!

April 15, 2015

TTarleton1he Social Media Collective is thrilled to announce that Tarleton Gillespie has joined Microsoft Research New England as a Principal Researcher. He joins Nancy Baym and Mary Gray in New England and danah boyd and Kate Crawford in New York City in forming the permanent core of the SMC. Tarleton is known for his influential work on the cultural politics of algorithms and platforms. His most recent book is the co-edited collection Media Technologies: Essays on Communication, Materiality, and Society (2014). He has also written about copyright in his book Wired Shut: Copyright and the Shape of Digital Culture (2007). He’s been at the forefront of bringing researchers together to think through issues of digital culture through the scholarly blog he co-founded,, which any regular reader of this site should be reading as well.

Prior to joining MSR, Tarleton was an Associate Professor in both Communication and Information Science at Cornell. He remains affiliated with Cornell as an Adjunct Associate Professor.

Those lucky enough to work with Tarleton know that in addition to being wicked smart (or, as they would say here in Boston, wicked smaht), he is a remarkably generous scholar and thinker who always makes the work of those around him better. Also, he’s an incredibly nice guy.

Welcome Tarleton!

Should You Boycott Traditional Journals?

March 31, 2015

(Or, Should I Stay or Should I Go?)

Is it time to boycott “traditional” scholarly publishing? Perhaps you are an academic researcher, just like me. Perhaps, just like me, you think that there are a lot of exciting developments in scholarly publishing thanks to the Internet. And you want to support them. And you also want people to read your research. But you also still need to be sure that your publication venues are held in high regard.

Or maybe you just receive research funding that is subject to new open access requirements.

Ask me about OPEN ACCESS

Academia is a funny place. We are supposedly self-governing. So if we don’t like how our scholarly communications are organized we should be able to fix this ourselves. If we are dissatisfied with the journal system, we’re going to have to do something about it. The question of whether or not it is now time to eschew closed access journals is something that comes up a fair amount among my peers.

It comes up often enough that a group of us at Michigan decided to write an article on the topic. Here’s the article.  It just came out yesterday (open access, of course):

Carl Lagoze, Paul Edwards, Christian Sandvig, & Jean-Christophe Plantin. (2015). Should I stay or Should I Go? Alternative Infrastructures in Scholarly Publishing. International Journal of Communication 9: 1072-1081.

The article is intended for those who want some help figuring out the answer to the question the article title poses: Should I stay or should I go? It’s meant help you decipher the unstable landscape of scholarly publishing these days. (Note that we restrict our topic to journal publishing.)

Researching it was a lot of fun, and I learned quite a bit about how scholarly communication works.

  • It contains a mention of the first journal. Yes, the first one that we would recognize as a journal in today’s terms. It’s Philosophical Transactions published by the Royal Society of London. It’s on Volume 373.
  • It should teach you about some of the recent goings-on in this area. Do you know what a green repository is? What about an overlay journal? Or the “serials crisis“?
  • It addresses a question I’ve had for a while: What the heck are those arXiv people up to? If it’s so great, why hasn’t it spread to all disciplines?
  • There’s some fun discussion of influential experiments in scholarly publishing. Remember the daring foundation of the Electronic Journal of Communication? Vectors? Were you around way-back-in-the-day when the pioneering, Web-based JCMC looked like this hot mess below? Little did we know that we were actually looking at the future.(*)


(JCMC circa 1995)

(*): Unless we were looking at the Gopher version, then in that case we were not looking at the future.

Ultimately, we adapt a framework from Hirschman that we found to be an aid to our thinking about what is going on today in scholarly communication. Feel free to play the following song on a loop as you read it.

(This post has been cross-posted on multicast.)


Get every new post delivered to your Inbox.

Join 1,436 other followers