Playing to the NYC Crowd and other SMC outings

As I hope you’ve heard by now, the SMC is publishing books like mad. Tarleton Gillespie’s Custodians of the Internet is blazing a trail through the content moderation debate, Mary Gray and Sid Suri’s Ghost Work will be out in May, and my own Playing to the Crowd has hit the road seeking readers.

In that vein, here are some upcoming public events where I will be talking about my book in NYC and its environs:


Monday October 1 @ 3-4 pm: A small book session for people who have read the book (pre-registration required) at Data & Society

Tuesday, October 2nd @ 7 pm: In conversation with Erin McKeown at Rough Trade 

Wednesday October 3 @ 12-1: Colloquium at Columbia University Communications Department, Pulitzer Hall 601B

Wednesday October 3 @ 6-9 pm: In conversation with Clive Thompson at Betaworks (register via the link)

Thursday, October 4 @ 2:50 PM – 4:10 PM: Rutgers School of Communication and Information (New Brunswick NJ)

There will be a few more talks coming up elsewhere (University of Illinois Chicago 11/29, University of Michigan 12/4, London in January, Oslo in February). If you’re interested in inviting me to talk with your folks, shoot me an email.

Hope to meet some of you there!

Re-assembling the Assembly Line: Digital Labor Economies and Demands for an Ambient Workforce

Watch Mary Gray’s talk at Harvard’s Berkman Center for Internet and Society where she discusses her findings from a two-year collaborative study on crowdwork –“the process of taking tasks that would normally be delegated to an employee and distributing them to a large pool of online workers, the ‘crowd,’ in the form of an open call.” In this talk she addresses ideas about the cultural meaning, political implications, and ethical demands of crowdwork.

See you at IR 16!

The Social Media Collective is showing up in force at Internet Research 16 in Phoenix, Arizona starting next week. Along with many friends of the SMC, there will be some of our permanent researchers (Nancy Baym, Tarleton Gillespie), postdocs current and past (Kevin Driscoll, Lana Swartz, Mike Ananny), past & present interns (Stacy Blasiola, Brittany Fiore-Gartland, Germaine Halegoua, Tero Karppi, J. Nathan Matias, Kat Tiidenberg,  Shawn Walker, Nick Seaver), past and future Visiting Researchers (Jean Burgess, Annette Markham, Susanna Paasonen, Hector Postigo, TL Taylor), and our past Research Assistants (Kate Miltner and Alex Leavitt). Hope to see you there!

Below is a list of papers and panels they will be presenting:




Digital Methods in Internet Research

Axel Bruns, Jean Burgess, Tim Highfield, Tama Leaver, Ben Light, Patrik Wikstrom




Beyond Big Bird: The Role of Humor in the Aggregate Interpretation of Live-Tweeted Events

11:00 am – 12:20 pm

Alex Leavitt, Kristen Guth, Kevin Driscoll, François Bar

ROUNDTABLE: Teaching Ethics in Big Data and Social Media: Bridging Theory and Practice in the Classroom

11:00 am- 12:20 pm

Shawn Thomas Walker, Anna Lauren Hoffmann, Jim Thatcher

You [Don’t] Gotta Pay the Troll Toll: A Transaction Costs Model of Online Harassment

1:30 pm – 2:50 pm

Stacy Blasiola

PANEL: Facebook’s Futures

1:30 pm, -2:50 pm

Tero Jukka Karppi, Andrew Richard Schrock, Andrew Herman, Fenwick McKelvey

ROUNDTABLE: Unpacking the Black Box of Qualitative Analysis: Exploring How the Imaginaries of Digital Inquiry are Constructed through Everyday Research Practice

3:10 pm -4:30 pm

Annette N Markham, Nancy K. Baym, T.L. Taylor, Lynn Schofield Clark, Jill Walker Rettberg

ROUNDTABLE: It’s Really About Ethics in Games Research: Reflections on #GamerGate

3:10 pm- 4:30 pm

Shira Chess, Adrienne Shaw, Adrienne Massanari, Christopher Paul, Kate Miltner, Casey O’Donnell

The Role of Breakdown in Imagining Big Data: Impediment to Insight to Innovation

3:10 pm- 4:30 pm

Anissa Tanweer, Brittany Fiore-Gartland, Cecilia Aragon

***The Nancy Baym Book Award will be  presented to Robert Gehl for Reverse Engineering Social Media at the banquet on Thursday night




Singing Data Over the Phone: A Social History of the Modem

9:00 am – 10:20 am

Kevin Driscoll

PANEL: Karma Policing: Re-imagining what we can (and can’t) post on the Internet

9:00 am – 10:20 am

Michael Burnam-Fink, Katrin Tiidenberg, John Carter McKnight, Cindy Tekobbe

ROUNDTABLE: Real and Imagined Boundaries: Building Connections Between Social Justice Activists and Internet Researchers

10:40 am – 12:00 pm

Catherine Knight Steele, Andre Brock, Annette Markham


ROUNDTABLE: Private Platforms under Public Pressure

10:40 am – 12:00 pm

Tarleton Gillespie, Mike Ananny, Christian Sandvig & J. Nathan Matias

ROUNDTABLE: Histories of Hating

10:40 am- 12:00 pm

Tamara Shepherd, Sam Srauy, Kevin Driscoll, Lana Swartz, Hector Postigo

The Challenges of Weibo for Data-Driven Digital Media Research

10:40 am – 12:00 pm

Jing Zeng, Jean Burgess, Axel Bruns

PANEL: Economies of the Internet II: Affect

1:00 pm – 2:20 pm

Sharif Mowlabocus, Nancy Baym, Susanna Paasonen, Dylan Wittkower, Kylie Jarrett

PANEL: Internet Research Ethics: New Contexts, New Challenges – New (Re)solutions?

Charles Melvin Ess, Annette Markham, Mark D. Johns, Yukari Seko, Katrin Tiidenberg, Camilla Granholm, Ylva Hård af Segerstad, Dick Kasperowski

Parks and Recommendation: Spatial Imaginaries in Algorithmic Systems

2:40 pm – 4:00 pm

Nick Seaver

Re-placeing the City: Digital Navigation Technologies and the Experience of Urban Place

2:40 pm – 4:00 pm

Germaine R. Halegoua


ROUNDTABLE: Compromised Data? Research on Social media platforms

4:20 pm- 5:40 pm

Greg Elmer, Ganaele Langlois, Joanna Redden, Axel Bruns, Jean Burgess, Robert Gehl

FISHBOWL: Exploring “Internet Culture”: Discourses, Boundaries, and Implications

4:20 pm – 5:40 pm

Kate Miltner, Ryan M. Milner, Whitney Phillips, Megan Sapnar Ankerson




FISHBOWL: The Quantified Imaginary

9:00 am – 10:20 am

Lee Humphreys, Jean Burgess, Joseph Turow

Imaginary Inactivity and the Share Button

9:00 am- 10: 20 am

Airi-Alina Allaste, Katrin Tiidenberg

ROUNDTABLE: ‘Black Box’ Data and ‘Flying Furball’ Networks: Challenges and Opportunities in Doing and Communicating Social Media Analytics

1:30 pm – 2:50 pm

Axel Bruns, Anders Olof Larsson, Katrin Weller

ROUNDTABLE: Ethics and Social Justice Meeting: Discussing AOIR Committees and Mission

3:10 pm – 4:30 pm

Annette N Markham, Jenny Stromer Galley, Catherine Knight Steele

Presentation; Between Platforms and Community: Moderators on Reddit

Presentation by intern Nathan Matias on the project he worked on during the summer at the SMC. He has continued to work on his research, so in case you have not read it here is a more updated post on his work:

Followup: 10 Factors Predicting Participation in the Reddit Blackout. Building Statistical Models of Online Behavior through Qualitative Research

Below is the presentation he did for MSR earlier this month:


(Part 2)

(Part 3)

(Part 4)

Co-creation and Algorithmic Self-Determination: A study of player feedback on game analytics in EVE Online

We are happy to share SMC’s intern Aleena Chia’s presentation of her summer project titled “Co-creation and Algorithmic Self-Determination: A study of player feedback on game analytics in EVE Online”.  

Aleena’s project summary and the videos of her presentation below:

Digital games are always already information systems designed to respond to players’ inputs with meaningful feedback (Salen and Zimmerman 2004). These feedback loops constitute a form of algorithmic surveillance that have been repurposed by online game companies to gather information about player behavior for consumer research (O’Donnell 2014). Research on player behavior gathered from game clients constitutes a branch of consumer research known as game analytics (Seif et al 2013).[1] In conjunction with established channels of customer feedback such as player forums, surveys, polls, and focus groups, game analytics informs companies’ adjustments and augmentations to their games (Kline et al 2005). EVE Online is a Massively Multiplayer Online Game (MMOG) that uses these research methods in a distinct configuration. The game’s developers assemble a democratically elected council of players tasked with the filtration of player interests from forums to inform their (1) agenda setting and (2) contextualization of game analytics in the planning and implementation of adjustments and augmentations.

This study investigates the council’s agenda setting and contextualization functions as a form of co-creation that draws players into processes of game development, as interlocutors in consumer research. This contrasts with forms of co-creation that emphasize consumers’ contributions to the production and circulation of media content and experiences (Banks 2013). By qualitatively analyzing meeting minutes between EVE Online’s player council and developers over seven years, this study suggests that co-creative consumer research draws from imaginaries of player governance caught between the twin desires of corporate efficiency and democratic efficacy. These desires are darned together through a quantitative public sphere (Peters 2001) that is enabled and eclipsed by game analytics. In other words, algorithmic techniques facilitate collective self-knowledge that players seek for co-creative deliberation; these same techniques also short circuit deliberation through claims of neutrality, immediacy, and efficiency.

The significance of this study lies in its analysis of a consumer public’s (Arvidsson 2013) ambivalent struggle for algorithmic self-determination – the determination by users through deliberative means of how their aggregated acts should be translated by algorithms into collective will. This is not primarily a struggle of consumers against corporations; nor of political principles against capitalist imperatives; nor of aggregated numbers against individual voices. It is a struggle within communicative democracy for efficiency and efficacy (Anderson 2011). It is also a struggle for communicative democracy within corporate enclosures. These struggles grind on productive contradictions that fuel the co-creative enterprise. However, while the founding vision of co-creation gestured towards a win-win state, this analysis concludes that algorithmic self-determination prioritizes efficacy over efficiency, process over product. These commitments are best served by media companies oriented towards user retention rather than recruitment, business sustainability rather than growth, and that are flexible enough to slow down their co-creative processes.

[1] Seif et al (2013) maintain that player behavior data is an important component of game analytics, which includes the statistical analysis, predictive modeling, optimization, and forecasting of all forms of data for decision making in game development. Other data include revenue, technical performance, and organizational process metrics.

(Video 1)

(Video 2)

(Video 3)

(Video 4)

A Research Agenda for Accountable Algorithms

What should people who are interested in accountability and algorithms be thinking about? Here is one answer: My eleven-minute remarks are now online from a recent event at NYU. I’ve edited them to intersperse my slides.

This talk was partly motivated by the ethics work being done in the machine learning community. That is very exciting and interesting work and I love, love, love it. My remarks are an attempt to think through the other things we might also need to do. Let me know how to replace the “??” in my slides with something more meaningful!

Preview: My remarks contain a minor attempt at a Michael Jackson joke.



A number of fantastic Social Media Collective people were at this conference — you can hear Kate Crawford in the opening remarks.  For more videos from the conference, see:

Algorithms and Accountability

Thanks to Joris van Hoboken, Helen Nissenbaum and Elana Zeide for organizing such a fab event.

If you bought this 11-minute presentation you might also buy: Auditing Algorithms, a forthcoming workshop at Oxford.



(This was cross-posted to multicast.)

Show-and-Tell: Algorithmic Culture

Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.

I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.”  They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:

Three Ways to Demonstrate Algorithmic Culture

(1.) puppy dog hate (Google Instant)

You’ll want to read the instructions fully before trying this. Go to and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter].  That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:

Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “).  Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.

What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].

This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.

It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.

Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)

This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.

Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.

(2.) Top Stories Fail (Facebook)

In this example, you’ll need a Facebook account.  Go to and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)

Switch this toggle back and forth and look at how your feed changes.

What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.

We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”

Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)

It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)

What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.

There are deeper things to say about Facebook, but this is enough to start with. Onward.

(3.) Your DoubleClick Cookie Filling (DoubleClick)

This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut:

Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct.  Here’s a screen shot:

If you have “interests” listed, click on “Edit” to see a list of topics.

What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.

What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.

This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.

This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).

Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”).  Right now it thinks I am interested in 27 things, some of them are:

  • Standardized & Admissions Tests (Yes.)
  • Roleplaying Games (Yes.)
  • Dishwashers (No.)
  • Dresses (No.)

You can also type in your own interests to save Google the trouble of profiling you.

Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.

These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast.  I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.

In my next post I’ll write about how the students reacted to all this.

(This was also cross-posted to multicast.)

Legal Portraits of Web Users

This Summer I became very interested in what I think I will be calling “legal portraits of digital subjects” or something similar. I came to this through doing a study on MOOCs with SMC this summer. The title of the project is “Students as End Users in the MOOC Ecology” (the talk is available online).  In the project I am looking at what the Big 3 MOOC companies are saying publicly about the “student” and “learner” role and comparing it to how the same subject is legally constituted to try to understand the cultural implications of turning students into “end users”.

As I was working through this project, and thinking of implications outside of MOOCs and Higher Ed, I realized these legal portraits are constantly being painted in digital environments. As users of the web/internet/digital tools we are constantly in the process of accepting various clickwrap  and browse-wrap agreements without thinking twice about it, because it has become a standard cultural practice.

In writing this post I’ve already entered numerous binding legal agreements. Here are some of the institutions that have terms I am to follow:

  1. Internet Service Provider

  2. Web Browser

  3. Document Hosting Service (I wrote this in the cloud somewhere else first)

  4. Blog Hosting Company

  5. Blog Platform

  6. Various Companies I’ve Accepted Cookies From

  7. Social Media Sites

I’ve gone through and read some of the Terms (some of them I cannot find). I’ve allowed for the licensing and reproduction of this work in multiple places without even thinking twice about it.  We talk a lot about privacy concerns.  We know that by producing things like blog post, or status updates we are agreeing to being surveilled to various degrees.  I’d love to start a broader conversation on the effects of agreeing to a multitude of Terms though, not just privacy, simply by logging on and opening a browser.