New anthology on media technologies, bringing together STS and Communication perspectives

I’m thrilled to announce that our anthology, Media Technologies: Essays on Communication, Materiality, and Society, edited by myself with Pablo Boczkowski and Kirsten Foot, is now officially available from MIT Press. Contributors include Geoffrey Bowker, Finn Brunton, Gabriella Coleman, Gregory Downey, Steven Jackson, Christopher Kelty, Leah Lievrouw, Sonia Livingstone, Ignacio Siles, Jonathan Sterne, Lucy Suchman, and Fred Turner. We’ve secured permission to share the introduction with you. A blurb:

In recent years, scholarship around media technologies has finally shed the presumption that technologies are separate from and powerfully determining of social life, seeing them instead as produced by and embedded in distinct social, cultural, and political practices – and as socially significant because of that. This has been helped along by a productive intersection between work in science and technology studies (STS) interested in information technologies as complex sociomaterial phenomena, and work in communication and media studies attuned to the symbolic and public dimensions of these tools.

In this volume, scholars from both fields come together to provide some conceptual paths forward for future scholarship. Two sets of essays and commentaries comprise this collection: the first addresses the relationship between materiality and mediation, considering such topics as the lived realities of network infrastructure. The second highlights media technologies as fragile and malleable, held together through the minute, unobserved work of many, including efforts to keep these technologies alive.

Please feel free to circulate this introduction to others, and write back to us with your thoughts, criticisms, and ideas. We hope this volume helps anchor the exciting conversations we see happening in the field, and serves a launchpad for future scholarship.

ToC and Chapter 1 – Introduction (Media Technologies)

Reddit, Mathematically the Anti-Facebook (+ other thoughts on algorithmic culture)

(or, Are We Social Insects?)

I worried that my last blog post was too short and intellectually ineffectual. But given the positive feedback I’ve received, my true calling may be to write top ten lists of other people’s ideas, based on conferences I attend. So here is another list like that.

These are my notes from my attendance at “Algorithmic Culture,” an event in the University of Michigan’s Digital Currents program. It featured a lecture by the amazing Ted Striphas. These notes also reflect discussion after the talk that included Megan Sapnar Ankerson, Mark Ackerman, John Cheney-Lippold and other people I didn’t write down.

Ted has made his work on historicizing the emergence of an “algorithmic culture” (Alex Galloway‘s term) available widely already, so my role here is really just to point at it and say: “Look!” (Then applaud.)

If you’re not familiar with this general topic area (“algorithmic culture”) see Tarleton Gillespie’s recent introduction The Relevance of Algorithms and then maybe my own writing posse’s Re-Centering the Algorithm. OK here we go:

Eight Questions About Algorithms and Culture

  1. Are algorithms centralizing? Algorithms, born from ideas of decentralized control and cybernetics, were once seen as basically anti-hierarchical. Fifty years ago we searched for algorithms in nature and found them decentralized — today engineers write them and we find them centralizing.
  2. OR, are algorithms fundamentally democratic? Even if Google and Facebook have centralized the logic, they claim “democracy!” because we provide the data. YouTube has no need of kings. The LOLcats and fail videos are there by our collective will.
  3. Many of today’s ideas about algorithms and culture can be traced to earlier ideas about social insects. Entomology once noted that termites “failed to evolve” because their algorithms, based on biology, were too inflexible. How do our algorithms work? Too inflexible? (and does this mean we are social insects?)
  4. The specific word “algorithm” is a recent phenomenon, but the idea behind it is not new. (Consider: plan, recipe, procedure, script, program, function, …) But do we think about these ideas differently now? If so, maybe it is who looks at them and where they look. In early algorithmic thinking people were the logic and housed the procedure. Now computers house the procedure and people are the operands.
  5. Can “algorithmic culture” be countercultural? Fred Turner and John Markoff have traced the links between the counterculture and computing. Striphas argued that counterculture-like influences on what would become modern computing came much earlier than the 60s: consider the influence of WWII and The Holocaust. For example, Talcott Parsons saw culture through the lens of anti-authoritarianism. He also saw culture as the opposite of state power. Is culture fundamentally anti-state? This also leads me to ask: Is everything always actually about Hitler in the end?
  6. Today, the computer science definition of “algorithm” is similar to anthropologist Clifford Geertz’s definition of culture in 1970s — that is, a recipe, plan, etc. Why is this? Is this significant?
  7. Is Reddit the conceptual anti-Facebook? Reddit publicly discloses the algorithm that it uses to sort itself. There have been calls for Facebook algorithm transparency on normative grounds. What are the consequences of Reddit’s disclosure, if any? As Reddit’s algorithm is not driven by Facebook’s business model, does that mean these two social media platform sorting algorithms are mathematically (or more properly, procedurally) opposed?
  8. Are algorithms fundamentally about homeostasis? (That’s the idea, prevalent in cybernetics and 1950s social science, that the systems being described are stable.) In other words, when algorithms are used today is there an implicit drive toward stability, equilibrium, or some other similar implied goal or similar standard of beauty for a system?

Whew, I’m done. What a great event!

I’m skeptical about that last point (algorithms = homeostasis) but the question reminds me of “The Use and Abuse of Vegetational Concepts,” part 2 of the 2011 BBC documentary/insane-music-video by Adam Curtis titled All Watched Over by Machines of Loving Grace. It is a favorite of mine. Although I think many of the implied claims are not true, it’s worth watching for the soundtrack and jump cuts alone.

It’s all about cybernetics and homeostasis. I’ll conclude with it… “THIS IS A STORY ABOUT THE RISE OF THE MACHINES”:

All Watched Over By Machines of Loving Grace 2 from SACPOP on Vimeo.

P.S.

Some of us also had an interesting side conversation about what job would be the “least algorithmic.” Presumably something that was not repeatable — it differs each time it is performed. Some form of performance art? This conversation led us to think that everything is actually algorithmic.

Tumblr, NSFW porn blogging, and the challenge of checkpoints

After Yahoo’s high-profile purchase of Tumblr, when Yahoo CEO Marissa Mayer said that she would “promise not to screw it up,” this is probably not what she had in mind. Devoted users of Tumblr have been watching closely, worried that the cool, web 2.0 image blogging tool would be tamed by the nearly two-decade-old search giant. One population of Tumblr users, in particular, worried a great deal: those that used Tumblr to collect and share their favorite porn. This is a distinctly large part of the Tumblr crowd: according to one analysis, somewhere near or above 10% of Tumblr is “adult fare.”

Now that group is angry. And Tumblr’s new policies, that made them so angry, are a bit of a mess. Two paragraphs from now, I’m going to say that the real story is not the Tumblr/Yahoo incident, or how it was handled, or even why it’s happening. But the quick run-down, and it’s confusing if you’re not a regular Tumblr user. Tumblr had a self-rating system: blogs with “occasional” nudity should self-rate as “NSFW”. Blogs with “substantial” nudity should rate themselves as “adult.” About two months ago, some Tumblr users noticed that blogs rated “adult” were no longer being listed with the major search engines. Then in June, Tumblr began taking both “NSFW” and “adult” blogs out of their internal search results — meaning, if you search in Tumblr for posts tagged with a particular word, sexual or otherwise, the dirty stuff won’t come up. Unless the searcher already follows your blog, then the “NSFW” posts will appear, but not the “adult” ones. Akk, here, this is how Tumblr tried to explain it:

What this meant is that your existing followers of a blog can largely still see your “NSFW” blog, but it would be very difficult for anyone new to find it. David Karp, founder and CEO of Tumblr, dodged questions about it on the Colbert Report, saying only that Tumblr doesn’t want to be responsible for drawing the lines between artistic nudity, casual nudity, and hardcore porn.

Then a new outrage emerged when some users discover that, in the mobile  version of Tumblr, some tag searches turn up no results, dirty or otherwise — and not just for obvious porn terms, like “porn,” but also for broader terms, like “gay”. Tumblr issued a quasi-explanation on their blog, which some commentators and users found frustratingly vague and unapologetic.

Ok. The real story is not the Tumblr/Yahoo incident, or how it was handled, or even why it’s happening. Certainly, Tumblr could have been more transparent about the details of their original policy, or the move in May or earlier to de-list adult Tumblr blogs in major search engines, or the decision to block certain tag results. Certainly, there’ve been some delicate conversations going on at Yahoo/Tumblr headquarters, for some time now, on how to “let Tumblr be Tumblr” (Mayer’s words) and also deal with all this NSFW blogging “even though it may not be as brand safe as what’s on our site” (also Mayer). Tumblr puts ads in its Dashboard, where only logged-in users see them, so arguably the ads are never “with” the porn — but maybe Yahoo is looking to change that, so that the “two companies will also work together to create advertising opportunities that are seamless and enhance the user experience.”

What’s ironic is that, I suspect, Tumblr and Yahoo are actually trying to find ways to remain permissive when it comes to NSFW content. They are certainly (so far) more permissive than some of their competitors, including Instagram, Blogger, Vine, and Pinterest, all of whom have moved in the last year to remove adult content, make it systematically less visible to their users, or prevent users from pairing advertising with it. The problem here is their tactics.

Media companies, be they broadcast or social, have fundamentally two ways to handle content that some but not all of their users find inappropriate.

First, they can remove some of it, either by editorial fiat or at the behest of the community. This means writing up policies that draw those tricky lines in the sand (no nudity? what kind of nudity? what was meant by the nudity?), and then either taking on the mantle (and sometimes the flak) of making those judgments themselves, or having to decide which users to listen to on which occasions for which reasons.

Second, and this is what Tumblr is trying, is what I’ll call the “checkpoint” approach. It’s by no means exclusive to new media: putting the X-rated movies in the back room at the video store, putting the magazines on the shelf behind the counter, wrapped in brown paper, scheduling the softcore stuff on Cinemax after bedtime, or scrambling the adult cable channel, all depend on the same logic. Somehow the provider needs to keep some content from some people and deliver it to others. (All the while, of course, they need to maintain their reputation as defender of free expression, and not appear to be “full of porn,” and keep their advertisers happy. Tricky.)

To run such a checkpoint requires (1) knowing something about the content, (2) knowing something about the people, and (3) having a defensible line between them.

First, the content. That difficult decision, about what is artistic nudity, what’s casual nudity, and what’s pornographic? It doesn’t go away, but the provider can shift the burden of making that decision to someone else — not just to get it off their shoulders, but sometimes to hand it someone more capable of making it. Adult movie producers or magazine publishers can self-rate their content as pornographic. An MPAA-sponsored board can rate films. There are problems, of course: either the “who are these people?” problem, as in the mysterious MPAA ratings board, or the “these people are self-interested” problem, as when TV production houses rate their own programs. Still, this self-interest can often be congruent with the interests of the provider: X-rated movie producers know that their options may be the back room or not at all, and gain little i pretending that they’re something they’re not.

Next, the people. It may seem like a simple thing, just keeping the dirty stuff on the top shelf and carding people who want to buy it. Any bodega shopkeep can manage to do it. But it is simple only because it depends on a massive knowledge architecture, the driver’s license, that it didn’t have to generate itself. This is a government sponsored, institutional mechanism that, in part, happens to be engaged in age verification. It requires a massive infrastructure for record keeping, offices throughout the country, staff, bureaucracy, printing services, government authorization, and legal consequences for cases of fraud. All that so that someone can show a card and prove they’re of a certain age. (That kind of certified, high-quality data is otherwise hard to come by, as we’ll see in a moment.)

Finally, a defensible line. The bodega has two: the upper shelf and the cash register. The kids can’t reach, and even the tall ones can’t slip away uncarded, unless they’re also interested in theft. Cable services use encryption: the signal is scrambled unless the cable company authorizes it to be unscrambled. This line is in fact not simple to defend: the descrambler used to be in the box itself, which was in the home and, with the right tools and expertise, openable by those who might want to solder the right tab and get that channel unscrambled. This meant there had to be laws against tampering, another external apparatus necessary to make this tactic stick.

Tumblr? Well. All of this changes a bit when we bring it into the world of digital, networked, and social media. The challenges are much the same, and if we notice that the necessary components of the checkpoint are data, we can see how this begins to take on the shape that it does.

The content? Tumblr asked its users to self-rate, marking their blog as “NSFW” or “adult.” Smart, given that bloggers sharing porn may share some of Tumblr’s interest in putting it behind the checkpoint: many would rather flag their site as pornographic and get to stay on Tumblr, than be forbidden to put it up at all. Even flagged, Tumblr provides them what they need: the platform on which to collect content, a way to gain and keep interested viewers. The categories are a little ambiguous — where is the line between “occasional” and “substantial” nudity to be drawn? Why is the criteria only about amount, rather than degree (hard core vs soft core), category (posed nudity vs sexual act), or intent (artistic vs unseemly)? But then again, these categories are always ambiguous, and must always privilege some criteria over others.

The people? Here it gets trickier. Tumblr is not imposing an age barrier, they’re imposing a checkpoint based on desire, dividing those who want adult content from those who don’t. This is not the kind of data that’s kept on a card in your wallet, backed by the government, subject to laws of perjury. Instead, Tumblr has two ways to try to know what a user wants: their search settings, and what they search for. If users have managed to correctly classify themselves into “Safe Mode,” indicating in the settings that they do not want to see anything flagged as adult, and people posting content have correctly marked their content as adult or not, this should be an easy algorithmic equation: “safe” searcher is never shown “NSFW” content. The only problems would be user error: searchers who do not set their search settings correctly, and posters who do not flag their adult content correctly. Reasonable problems, and the kind of leakage that any system of regulation inevitably faces. Flagging at the blog level (as opposed to flagging each post as adult or not) is a bit of a dull instrument: all posts from my “NSFW” blog are being withheld from safe searchers, even the ones that have no questionable content — despite the fact that by their own definition a “NSFW” tumblr blog only has “occasional” nudity. Still, getting people to rate every post is a major barrier, few will do so diligently, and it doesn’t fit into simple “web button” interfaces.

Defending the dividing line? Since the content is digital, and the information about content and users is data, it should not be surprising that the line here is algorithmic. Unlike the top shelf or the back room, the adult content on Tumblr lives amidst the rest of the archive. And there’s no cash register, which means that there’s no unavoidable point at which use can be checked. There is the login, which explains why non-logged-in users are treated as only wanting “safe” content. But, theoretically, an “algorithmic checkpoint” should work based on search settings and blog ratings. As a search happens, compare the searcher’s setting with the content’s rating, and don’t deliver the dirty to the safe.

But here’s where Tumblr took two additional steps, the ones that I think raise the biggest problem for the checkpoint approach in the digital context.

Tumblr wanted to extend the checkpoint past the customer who walks into the store and brings adult content to the cash register, out to the person walking by the shop window. And those passersby aren’t always logged in, they come to Tumblr in any number of ways. Because here’s the rub with the checkpoint approach: it does, inevitably, remind the population of possible users, that you do allow the dirty stuff. The new customer who walks into the video store, and sees that there is a back room, even if the never go in, may reject your establishment for even offering it. Can the checkpoint be extended, to decide whether to even reveal to someone that there’s porn available inside? If not in the physical world, maybe in the digital?

When Tumblr delisted its adult blogs from the major search engines, they wanted to keep Google users from seeing that Tumblr has porn. This, of course, runs counter to the fundamental promise of Tumblr, as a publishing platform, that Tumblr users (NSFW and otherwise) count on. And users fumed: “Removal from search in every way possible is the closest thing Tumblr could do to deleting the blogs altogether, without actually removing 10% of its user base.” Here is where we may see the fundamental tension at the Yahoo/Tumblr partnership: they may want to allow porn, but do they want to be known for allowing porn?

Tumblr also apparently wanted to extend the checkpoint in the mobile environment — or perhaps were required to, by Apple. Many services, especially those spurred or required by Apple to do so, aim to prevent the “accidental porn” situation: if I’m searching for something innocuous, can they prevent a blast of unexpected porn in response to my query? To some degree, the “NSFW” rating and the “safe” setting should handle this, but of course content that a blogger failed (or refused) to flag still slips through. So Tumblr (and other sites)  institute a second checkpoint: if the search term might bring back adult content, block all the results for that term. In Tumblr, this is based on tags: bloggers add tags that describe what they’ve posted, and search queries seek matches in those tags.

When you try to choreograph users based on search terms and tags, you’ve doubled your problem. This is not clean, assured data like a self-rating of adult content or the age on a driver’s license. You’re ascertaining what the producer meant when they tagged a post using a certain term, and what the searcher meant when they use the same term as a search query. If I search for the word “gay,” I may be looking for a gay couple celebrating the recent DOMA decision on the steps of the Supreme Court — or “celebrating” bent over the arm of the couch. Very hard for Tumblr to know which I wanted, until I click or complain.

Sometimes these terms line up quite well, either by accident, or on purpose: for instance when users of Instagram indicated pornographic images by tagging them “pornstagram,” a made-up word that would likely mean nothing else. (This search term no longer returns any results, although  — whoa! — it does on Tumblr!.) But in just as many cases, when you use the word gay to indicate a photo of your two best friends in a loving embrace, and I use the word gay in my search query to find X-rated pornography, it becomes extremely difficult for the search algorithm to understand what to do about all of those meanings converging on a single word.

Blocking all results to the query “gay,” or “sex”, or even “porn” may seem, form one vantage point (Yahoo’s?), to solve the NSFW problem. Tumblr is not alone in this regard: Vine and Instagram return no results to the search term “sex,” though that does not mean that no one’s using it as a tag – though Instagram returns millions of results for “gay,” Vine, like Tumblr, returns none. Pinterest goes further, using the search for “porn” as a teaching moment: it pops up a reminder that nudity is not permitted on the site, then returns results which, because of the policy, are not pornographic. By blocking search terms/tags, no porn accidentally makes it to the mobile platform or to the eyes of its gentle user. But, this approach fails miserably at getting adult content to those that want it, and more importantly, in Tumblr’s case, it relegates a broadly used and politically vital term like “gay” to the smut pile.

Tumblr’s semi-apology has begun to make amends. The two categories, “NSFW” and “adult” are now just “NSFW” and the blogs masked as such are now available in Tumblr’s internal search and in the major search engines. Tumblr has promised to work on a more intelligent filtering system. But any checkpoint that depends on data that’s expressive rather than systemic — what we say, as opposed to what we say we are — is going to step clumsily both on the sharing of adult content and the ability to talk about subjects that have some sexual connotations, and could architect the spirit and promise out of Tumblr’s publishing platform.

This was originally posted at Culture Digitally.

Can an algorithm be wrong? Twitter Trends, the specter of censorship, and our faith in the algorithms around us

The interesting question is not whether Twitter is censoring its Trends list. The interesting question is, what do we think the Trends list is, what it represents and how it works, that we can presume to hold it accountable when we think it is “wrong?” What are these algorithms, and what do we want them to be?

(Cross posted from Culture Digitally.)

It’s not the first time it has been asked. Gilad Lotan at SocialFlow (and erstwhile Microsoft UX designer), spurred by questions raised by participants and supporters of the Occupy Wall Street protests, asks the question: is Twitter censoring its Trends list to exclude #occupywallstreet and #occupyboston? While the protest movement gains traction and media coverage, and participants, observers and critics turn to Twitter to discuss it, why are these widely-known hashtags not Trending? Why are they not Trending in the very cities where protests have occurred, including New York?

The presumption, though Gilad carefully debunks it, is that Twitter is, for some reason, either removing #occupywallstreet from Trends, or has designed an algorithm to prefer banal topics like Kim Kardashian’s wedding over important contentious, political debates. Similar charges emerged around the absence of #wikileaks from Twitter’s Trends when the trove of diplomatic cables were released in December of last year, as well as around the #demo2010 student protests in the UK, the controversial execution of #TroyDavis in the state of Georgia, the Gaza #flotilla, even the death of #SteveJobs. Why, when these important points of discussion seem to spike, do they not Trend?

Despite an unshakeable undercurrent of paranoid skepticism, in the analyses and especially in the comment threads that trail off from them, most of those who have looked at the issue are reassured that Twitter is not in fact censoring these topics. Their absence on the Trends listings is a product of the particular dynamics of the algorithm that determines Trends, and the misunderstanding most users have about what exactly the Trends algorithm is designed to identify. I do not disagree with this assessment, and have no particular interest in reopening these questions. Along with Gilad’s thorough analysis, Angus Johnston has a series of posts (1, 2, 3, and 4) debunking the charge of censorship around #wikileaks. Trends has been designed (and re-designed) by Twitter not to simply measure popularity, i.e. the sheer quantity of posts using a certain word or hashtag. Instead, Twitter designed the Trends algorithm to capture topics that are enjoying a surge in popularity, rising distinctly above the normal level of chatter. To do this, their algorithm is designed to take into account not just the number of tweets, but factors such as: is the term accelerating in its use? Has it trended before? Is it being used across several networks of people, as opposed to a single, densely-interconnected cluster of users? Are the tweets different, or are they largely re-tweets of the same post? As Twitter representatives have said, they don’t want simply the most tweeted word (in which case the Trend list might read like a grammar assignment about pronouns and indefinite articles) or the topics that are always popular and seem destined to remain so (apparently this means Justin Bieber).

The charge of censorship is, on the face of it, counterintuitive. Twitter has, over the last few years, enjoyed and agreed with claims that has played a catalytic role in recent political and civil unrest, particularly in the Arab world, wearing its political importance as a red badge of courage (see Shepherd and Busch).  To censor these hot button political topics from Trends would work against their current self-proclaimed purposes and, more importantly, its marketing tactics. And, as Johnston noted, the tweets themselves are available, many highly charged – so why, and for what ends, remove #wikileaks or #occupywallstreet from the Trends list, yet  let the actual discussion of these topics run free?

On the other hand, the vigor and persistence of the charge of censorship is not surprising at all. Advocates of these political efforts want desperately for their topic to gain visibility. Those involved in the discussion likely have an exaggerated sense of how important and widely-discussed it is. And, especially with #wikileaks and #occupywallstreet, the possibility that Twitter may be censoring their efforts would fit their supporters’ ideological worldview: Twitter might be working against Wikileaks just as Amazon, Paypal, and Mastercard were; or in the case of #occupywallstreet, while the Twitter network supports the voice of the people, Twitter the corporation of course must have allegiances firmly intertwined with the fatcats of Wall Street.

But the debate about tools like Twitter Trends is, I believe, a debate we will be having more and more often. As more and more of our online public discourse takes place on a select set of private content platforms and communication networks, and these providers turn to complex algorithms to manage, curate, and organize these massive collections, there is an important tension emerging between what we expect these algorithms to be, and what they in fact are. Not only must we recognize that these algorithms are not neutral, and that they encode political choices, and that they frame information in a particular way. We must also understand what it means that we are coming to rely on these algorithms, that we want them to be neutral, we want them to be reliable, we want them to be the effective ways in which we come to know what is most important.

Twitter Trends is only the most visible of these tools. The search engine itself, whether Google or the search bar on your favorite content site (often the same engine, under the hood), is an algorithm that promises to provide a logical set of results in response to a query, but is in fact the result of an algorithm designed to take a range of criteria into account so as to serve up results that satisfy, not just the user, but the aims of the provider, their vision of relevance or newsworthiness or public import, and the particular demands of their business model. As James Grimmelmann observed, “Search engines pride themselves on being automated, except when they aren’t.” When Amazon, or YouTube, or Facebook, offer to algorithmically and in real time report on what is “most popular” or “liked” or “most viewed” or “best selling” or “most commented” or “highest rated,” it is curating a list whose legitimacy is based on the presumption that it has not been curated. And we want them to feel that way, even to the point that we are unwilling to ask about the choices and implications of the algorithms we use every day.

Peel back the algorithms, and this becomes quite apparent. Yes, a casual visit to Twitter’s home page may present Trends as an unproblematic list of terms, that might appear a simple calculation. But a cursory look at Twitter’s explanation of how Trends works – in its policies and help pages, in its company blog, in tweets, in response to press queries, even in the comment threads of the censorship discussions – Twitter lays bare the variety of weighted factors Trends takes into account, and cops to the occasional and unfortunate consequences of these algorithms. Wikileaks may not have trended when people expected it to because it had before; because the discussion of #wikileaks grew too slowly and consistently over time to have spiked enough to draw the algorithm’s attention; because the bulk of messages were retweets; or because the users tweeting about Wikileaks were already densely interconnected. When Twitter changed their algorithm significantly in May 2010 (though, undoubtedly, it has been tweaked in less noticeable ways before and after), they announced the change in their blog, explained why it was made – and even apologized directly to Justin Bieber, whose position in the Trends list would be diminished by the change. In response to charges of censorship, they have explained why they believe Trends should privilege terms that spike, terms that exceed single clusters of interconnected users, new content over retweets, new terms over already trending ones. Critics gather anecdotal evidence and conduct thorough statistical analysis, using available online tools that track the raw popularity of words in a vastly more exhaustive and catholic way than Twitter does, or at least is willing to make available to its users. The algorithms that define what is “trending” or what is “hot” or what is “most popular” are not simple measures, they are carefully designed to capture something the site providers want to capture, and to weed out the inevitable “mistakes” a simple calculation would make.

At the same time, Twitter most certainly does curate its Trends lists. It engages in traditional censorship: for example, a Twitter engineer acknowledges here that Trends excludes profanity, something that’s obvious from the relatively circuitous path that prurient attempts to push dirty words onto the Trends list must take. Twitter will remove tweets that constitute specific threats of violence, copyright or trademark violations, impersonation of others, revelations of others’ private information, or spam. (Twitter has even been criticized (1, 2) for not removing some terms from Trends, as in this user’s complaint that #reasonstobeatyourgirlfriend was permitted to appear.) Twitter also engages in softer forms of governance, by designing the algorithm so as to privilege some kinds of content and exclude others, and some users and not others. Twitter offers rules, guidelines, and suggestions for proper tweeting, in the hopes of gently moving users towards the kinds of topics that suit their site and away from the kinds of content that, were it to trend, might reflect badly on the site. For some of their rules for proper profile content, tweet content, and hashtag use, the punishment imposed on violators is that their tweets will not factor into search or Trends – thereby culling the Trends lists by culling what content is even in consideration for it. Twitter includes terms in its Trends from promotional partners, terms that were not spiking in popularity otherwise. This list, automatically calculated on the fly, is yet also the result of careful curation to decide what it should represent, what counts as “trend-ness.”

Ironically, terms like #wikileaks and #occupywallstreet are exactly the kinds of terms that, from a reasonable perspective, Twitter should want to show up as Trends. If we take the reasonable position that Twitter is benefiting from its role in the democratic uprisings of recent years, and that it is pitching itself as a vital tool for important political discussion, and that it wants to highlight terms that will support that vision and draw users to topics that strike them as relevant, #occupywallstreet seems to fit the bill. So despite carefully designing their algorithm away from the perennials of Bieber and the weeds of common language, it still cannot always successfully pluck out the vital public discussion it might want. In this, Twitter is in agreement with its critics; perhaps #wikileaks should have trended after the diplomatic cables were released. These algorithms are not perfect; they are still cudgels, where one might want scalpels. The Trends list can often look, in fact, like a study in insignificance. Not only are the interests of a few often precisely irrelevant to the rest of us, but much of what we talk about on Twitter every day is in fact quite everyday, despite their most heroic claims of political import. But, many Twitter users take it to be not just a measure of visibility but a means of visibility – whether or not the appearance of a term or #hashtag increases audience, which is not in fact clear. Trends offers to propel a topic towards greater attention, and offers proof of the attention already being paid. Or seems to.

Of course, Twitter has in its hands the biggest resource by which to improve their tool, a massive and interested user base. One could imagine “crowdsourcing” this problem, asking users to rate the quality of the Trends lists, and assessing these responses over time and a huge number of data points. But they face a dilemma: revealing the workings of their algorithm, even enough to respond to charges of censorship and manipulation, much less to share the task of improving it, risks helping those who would game the system. Everyone from spammers to political activist to 4chan tricksters to narcissists might want to “optimize” their tweets and hashtags so as to show up in the Trends. So the mechanism underneath this tool, that is meant to present a (quasi) democratic assessment of what the public finds important right now, cannot reveals its own “secret sauce.”

Which in some ways leaves us, and Twitter, in an unresolvable quandary. The algorithmic gloss of our aggregate social data practices can always be read/misread as censorship, if the results do not match what someone expects. If #occupywallstreet is not trending, does that mean (a) it is being purposefully censored? (b) it is very popular but consistently so, not a spike? (c) it is actually less popular than one might think? Broad scrapes of huge data, like Twitter Trends, are in some ways meant to show us what we know to be true, and to show us what we are unable to perceive as true because of our limited scope. And we can never really tell which it is showing us, or failing to show us. We remain trapped in an algorithmic regress, and not even Twitter can help, as it can’t risk revealing the criteria it used.

But what is most important here is not the consequences of algorithms, it is our emerging and powerful faith in them. Trends measures “trends,” a phenomena Twitter gets to define and build into its algorithm. But we are invited to treat Trends as a reasonable measure of popularity and importance, a “trend” in our understanding of the term. And we want it to be so. We want Trends to be an impartial arbiter of what’s relevant… and we want our pet topic, the one it seems certain that “everyone” is (or should be) talking about, to be duly noted by this objective measure specifically designed to do so. We want Twitter to be “right” about what is important… and sometimes we kinda want them to be wrong, deliberately wrong – because that will also fit our worldview: that when the facts are misrepresented, it’s because someone did so deliberately, not because facts are in many ways the product of how they’re manufactured.

We don’t have a sufficient vocabulary for assessing the algorithmic intervention a tool like Trends. We’re not good at comprehending the complexity required to make a tool like Trends – that seems to effortlessly identify what’s going on, that isn’t swamped by the mundane or the irrelevant. We don’t have a language for the unexpected associations algorithms make, beyond the intention (or even comprehension) of their designers. We don’t have a clear sense of how to talk about the politics of this algorithm. If Trends, as designed, does leave #occupywallstreet off the list, even when its use is surging and even when some people think it should be there: is that the algorithm correctly assessing what is happening? Is it looking for the wrong things? Has it been turned from its proper ends by interested parties? Too often, maybe in nearly every instance in which we use these platforms, we fail to ask these questions. We equate the “hot” list with our understanding of what is popular, the “trends” list with what matters. Most importantly, we may be unwilling or unable to recognize our growing dependence on these algorithmic tools, as our means of navigating the huge corpuses of data that we must, because we want so badly for these tools to perform a simple, neutral calculus, without blurry edges, without human intervention, without having to be tweaked to get it “right,” without being shaped by the interests of their providers.

Guilt Through Algorithmic Association

You’re a 16-year-old Muslim kid in America. Say your name is Mohammad Abdullah. Your schoolmates are convinced that you’re a terrorist. They keep typing in Google queries likes “is Mohammad Abdullah a terrorist?” and “Mohammad Abdullah al Qaeda.” Google’s search engine learns. All of a sudden, auto-complete starts suggesting terms like “Al Qaeda” as the next term in relation to your name. You know that colleges are looking up your name and you’re afraid of the impression that they might get based on that auto-complete. You are already getting hostile comments in your hometown, a decidedly anti-Muslim environment. You know that you have nothing to do with Al Qaeda, but Google gives the impression that you do. And people are drawing that conclusion. You write to Google but nothing comes of it. What do you do?

This is guilt through algorithmic association. And while this example is not a real case, I keep hearing about real cases. Cases where people are algorithmically associated with practices, organizations, and concepts that paint them in a problematic light even though there’s nothing on the web that associates them with that term. Cases where people are getting accused of affiliations that get produced by Google’s auto-complete. Reputation hits that stem from what people _search_ not what they _write_.

It’s one thing to be slandered by another person on a website, on a blog, in comments. It’s another to have your reputation slandered by computer algorithms. The algorithmic associations do reveal the attitudes and practices of people, but those people are invisible; all that’s visible is the product of the algorithm, without any context of how or why the search engine conveyed that information. What becomes visible is the data point of the algorithmic association. But what gets interpreted is the “fact” implied by said data point, and that gives an impression of guilt. The damage comes from creating the algorithmic association. It gets magnified by conveying it.

  1. What are the consequences of guilt through algorithmic association?
  2. What are the correction mechanisms?
  3. Who is accountable?
  4. What can or should be done?

Note: The image used here is Photoshopped. I did not use real examples so as to protect the reputations of people who told me their story.

Update: Guilt through algorithmic association is not constrained to Google. This is an issue for any and all systems that learn from people and convey collective “intelligence” back to users. All of the examples that I was given from people involved Google because Google is the dominant search engine. I’m not blaming Google. Rather, I think that this is a serious issue for all of us in the tech industry to consider. And the questions that I’m asking are genuine questions, not rhetorical ones.