Algorithms, clickworkers, and the befuddled fury around Facebook Trends

The controversy about the human curators behind Facebook Trends has grown, since the allegations made last week by Gizmodo. Besides being a major headache for Facebook, it has helped prod a growing discussion about the power of Facebook to shape the information we see and what we take to be most important. But we continue to fail to find the right words to describe what algorithmic systems are, who generates them, and what they should do for users and for the public. We have to get this clear.

Here’s the case so far: Gizmodo says that Facebook hired human curators to decide which topics, identified by algorithms, would be listed as trending, how they should be named and summarized; one former curator alleged that his fellow curators often overlooked or suppressed conservative topics. This came too close on the heels of a report a few weeks back that Facebook employees had asked internally if the company had a responsibility to slow Donald Trump’s momentum. Angry critics have noted that Zuckerberg, Search VP Tom Stocky, and other FB execs are liberals. Facebook has vigorously disputed the allegation, saying that they have guidelines in place to insure consistency and neutrality, asserting that there’s no evidence that it happened, distributing their guidelines for how Trending topics are selected and summarized, after they were leaked, inviting conservative leaders in for a discussion, and pointing out their conservative bona fides. The Senate’s Commerce Committee, chaired by Republican Senator John Thune, issued a letter demanding answers from Facebook about it. Some wonder if the charges may have been overstated. Other Facebook news curators have spoken up, some to downplay the allegations and defend the process that was in place, others to highlight the sexist and toxic work environment they endured.

Commentators have used the controversy to express a range of broader concerns about Facebook’s power and prominence. Some argue it is unprecedented: “When a digital media network has one billion people connected to entertainment companies, news publications, brands, and each other, the right historical analogy isn’t television, or telephones, or radio, or newspapers. The right historical analogy doesn’t exist.” Others have made the case that Facebook is now as powerful as the media corporations, which have been regulated for their influence; that their power over news organizations and how they publish is growing; that they could potentially and knowingly engage in political manipulation; that they are not transparent about their choices; that they have become an information monopoly.

This is an important public reckoning about Facebook, and about social media platforms more generally, and it should continue. We clearly don’t yet have the language to capture the kind of power we think Facebook now holds. But it would be great if, along the way, we could finally mothball some foundational and deeply misleading assumptions about Facebook and social media platforms, assumptions that have clouded our understanding of their role and responsibility. Starting with the big one:

Algorithms are not neutral. Algorithms do not function apart from people.

 

We prefer the idea that algorithms run on their own, free of the messy bias, subjectivity, and political aims of people. It’s a seductive and persistent myth, one Facebook has enjoyed and propagated. But its simply false.

I’ve already commented on this, and many of those who study the social implications of information technology have made this point abundantly clear (including Pasquale, Crawford, Ananny, Tufekci, boyd, Seaver, McKelvey, Sandvig, Bucher, and nearly every essay on this list). But it persists: in statements made by Facebook, in the explanations offered by journalists, even in the words of Facebook’s critics.

If you still think algorithms are neutral because they’re not people, here’s a list, not even an exhaustive one, of the human decisions that have to be made to produce something like Facebook’s Trending Topics (which, keep in mind, pales in scope and importance to Facebook’s larger algorothmic endeavor, the “news feed” listing your friends’ activity). Some are made by the engineers designing the algorithm, others are made by curators who turn the output of the algorithm into something presentable. If your eyes start to glaze over, that’s the point; read any three points and then move on, they’re enough to dispel the myth. Ready?

(determining what activity might potentially be seen as a trend)
– what data should be counted in this initial calculation of what’s being talked about (all Facebook users, or subset? English language only? private posts too, or just public ones?)
– what time frame should be used in this calculation — both for the amount of activity happening “now” (one minute, one hour, one day?) and to get a baseline measure of what’s typical (a week ago? a different day at the same time, or a different time on the same day? one point of comparison or several?)
– should Facebook emphasize novelty? longevity? recurrence? (e.g., if it has trended before, should it be easier or harder for it to trend again?)
– how much of a drop in activity is sufficient for a trending topic to die out?
– which posts actually represent a single topic (e.g., when do two hashtags referring to the same topic?)
– what other signals should be taken into account? what do they mean? (should Facebook measure posts only, or take into account likes? how heavily should they be weighed?)
– should certain contributors enjoy some privileged position in the count? (corporate partners, advertisers, high-value users? pay-for-play?)

(from all possible trends, choosing which should be displayed)
– should some topics be dropped, like obscenity or hate speech?
– if so, who decides what counts as obscene or hateful enough to leave off?
– what topics should be left off because they’re too generic? (Facebook noted that it didn’t include “junk topics” that do not correlate to a real world event. What counts as junk, case by case?)

(designing how trends are displayed to the users)
– who should do this work? what expertise should they have? who hires them?
– how should a trend be presented? (word? title? summary?)
– what should clicking on a trend headline lead to? (some form of activity on Facebook? some collection of relevant posts? an article off the platform, and if so, which one?)
– should trends be presented in single list, or broken into categories? if so, can the same topic appear in more than one category?
– what are the boundaries of those categories (i.e. what is or isn’t “politics”?)
– should trends be grouped regionally or not? if so, what are the boundaries of each region?
– should trends lists be personalized, or not? If so, what criteria about the user are used to make that decision?

(what to do if the list is deemed to be broken or problematic in particular ways)
– who looks at this project to assess how its doing? how often, and with what power to change it?
– what counts as the list being broken, or off the mark, or failing to meet the needs of users or of Facebook?
– what is the list being judged against, to know when its off (as tested against other measures of Facebook activity? as compared to Twitter? to major news sites?)
– should they re-balance a Trends list that appears unbalanced, or leave it? (e.g. what if all the items in the list at this moment are all sports, or all celebrity scandal, or all sound “liberal”?)
– should they inject topics that aren’t trending, but seem timely and important?
– if so, according to what criteria? (news organizations? which ones? how many? US v. international? partisan vs not? online vs off?)
– should topics about Facebook itself be included?

These are all human choices. Sometimes they’re made in the design of the algorithm, sometimes around it. The result we see, a changing list of topics, is not the output of “an algorithm” by itself, but rather of an effort that combined human activity and computational analysis, together, to produce it.

So algorithms are in fact full of people and the decisions they make. When we let ourselves believe that they’re not, we let everyone — Zuckerberg, his software engineers, regulators, and the rest of us — off the hook for actually thinking out how they should work, leaving us all unprepared when they end up in the tall grass of public contention. “Any algorithm that has to make choices has criteria that are specified by its designers. And those criteria are expressions of human values. Engineers may think they are “neutral”, but long experience has shown us they are babes in the woods of politics, economics and ideology.” Calls for more public accountability, like this one from my colleague danah boyd, can only proceed once we completely jettison the idea that algorithms are neutral — and replace it with a different language that can assess the work that people and systems do together.

The problem is not algorithms, it’s that Facebook is trying to clickwork the news.

 

It is certainly in Facebook’s interest to obscure all the people involved, so users can keep believing that a computer program is fairly and faithfully hard at work. Dismantling this myth raises the kind of hard questions Facebook is fielding. But, once we jettison this myth, what’s left? It’s easy to despair that with so many human decisions involved, how could we ever get a fair and impartial measure of what matters? And forget the handful of people that designed the algorithm and the handful of people that select and summarize from it: Trends are themselves a measure of the activity of Facebook users. These trending topics aren’t produced by dozens of people but millions. Their judgment of what’s worth talking about, in each case and in the aggregate, may be so distressingly incomplete, biased, skewed, and vulnerable to manipulation, that it’s absurd to pretend it can tell us anything at all.

But political bias doesn’t come from the mere presence of people. It comes from how those people organized to do what they’re asked to do. Along with our assumption that algorithms are neutral is a matching and equally misleading assumption that people are always and irretrievably biased. But human endeavors are organized affairs, and can organized to work against bias. Journalism is full of people too, making all sorts of just as opaque, limited, and self-interested decisions. What we hope keeps journalism from slipping into bias and error is the well-established professional norms and thoughtful oversight.

The real problem here is not the liberal leanings of Facebook’s news curators. If conservative news topics were overlooked, it’s only a symptom of the underlying problem. Facebook wanted to take surges of activity that its algorithms could identify and turn them into news-like headlines. But it treated this as an information processing problem, not an editorial one. They’re “clickworking” the news.

Clickwork begins with the recognition that computers are good at some kinds of tasks, and humans others. The answer, it suggests, is to break the task at hand down into components and parcel them out to each accordingly. For Facebook’s trending topics, the algorithm is good at scanning an immense amount of data and identifying surges of activity, but not at giving those surges a name and a coherent description. That is handled by people — in industry parlance, this is the “human computation” part. These identified surges of activities are delivered to a team of curators, each one tasked with following a set of procedures to identify and summarize them. The work is segmented into simple and repetitive tasks, and governed by a set of procedures such that, even though different people are doing it, their output will look the same. In effect, the humans are given tasks that only humans can do, but they are not invited to do them in a human way: they are “programmed” by the modularized work flow and the detailed procedures so that they do the work like computers would. As Lilly Irani put it, clickwork “reorganizes digital workers to fit them both materially and symbolically within existing cultures of new media work.”

This is apparent in the guidelines that Facebook gives to their Trends curators. The documents, leaked to The Guardian then released by Facebook, did not reveal some bombshell about political manipulation, nor did they do much to demonstrate careful guidance on the part of Facebook around the issue of political bias. What’s most striking is that they are mind-numbingly banal: “Write the description up style, capitalizing the first letter of all major words…” “Do not copy another outlet’s headline…” “Avoid all spoilers for descriptions of scripted shows…” “After identifying the correct angle for a topic, click into the dropdown menu underneath the Unique Keyword fielding select the Unique Keyword that best fits the topic…” “Mark a topic as ‘National Story’ importance if it is among the 1-3 top stories of the day. We measure this by checking if it is leading at least 5 of the following 10 news websites…”  “Sports games: rewrite the topic name to include both teams…” This is not the news room, it’s the secretarial pool.

Moreover, these workers were kept separate from the rest of the full-time employees, worked under quotas for how many trends to identify and summarize that were increased as the project went on. As one curator noted, “The team prioritizes scale over editorial quality and forces contractors to work under stressful conditions of meeting aggressive numbers coupled with poor scheduling and miscommunication. If a curator is underperforming, they’ll receive an email from a supervisor comparing their numbers to another curator.” All were hourly contractors, were kept under non-disclose agreements and asked not to mention that they worked for Facebook. “’It was degrading as a human being,’ said another. ‘We weren’t treated as individuals. We were treated in this robot way.’” A new piece in The Guardian from one such news curator insists that it was also a toxic work environment, especially for women. These “data janitors” are rendered so invisible in the images of Silicon Valley and how tech works that, when we suddenly hear from one, we’re surprised.

Their work was organized to quickly produce capsule descriptions of bits of information that are styled the same — as if they were produced by an algorithm. (this lines up with other concerns about the use of algorithms and clickworkers to produce cheap journalism at scale, and the increasing influence of audience metrics about what’s popular on news judgment.)  It was not, however, organized to thoughtfully assemble a vital information resource that some users treat as the headlines of the day. It was not organized to help these news curators develop experience together on how to do this work well, or handle contentious topics, or reflect on the possible political biases in their choices. It was not likely to foster a sense of community and shared ambitions with Facebook, which might lead frustrated and over-worked news curators to indulge in their own political preferences. And I suspect it was not likely to funnel any insights they had about trending topics back to the designers of the algorithms they depended on.

Trends are not the same as news, but Facebook kinda wants them to be.

 

Part of why charges of bias are so compelling is that we have a longstanding concern about the problem of bias in news. For more than a century we’ve fretted about the individual bias of reporters, the slant of news organizations, and the limits of objectivity [http://jou.sagepub.com/content/2/2/149.abstract]. But is a list of trending topics a form of news? Are the concerns we have about balance and bias in the news relevant for trends?

“Trends” is a great word, the best word to have emerged amidst the social media zeitgeist. In a cultural moment obsessed with quantification, defended as being the product of an algorithm, “trends” is a powerfully and deliberately vague term that does not reveal what it measures. Commentators poke at Facebook for clearer explanations of how they choose trends, but “trends” could mean such a wide array of things, from the most activity to the most rapidly rising to a completely subjective judgment about what’s popular.

But however they are measured and curated, Facebook’s Trends are, at their core, measures of activity on the site. So, at least in principle, they are not news, they are expressions of interest. Facebook users are talking about some things, a lot, for some reason. This has little to do with “news” which implies an attention to events in the world and some judgment of importance. Of course, many things Facebook users talk about, though not all, are public events. And it seems reasonable to assume that talking about a topic represents some judgment of its importance, however minimal. Facebook takes these identifiable surges of activity as proxies for importance. Facebook users “surface” the news… approximately. The extra step and “injecting” stories drawn from the news that were for whatever reason not surging among Facebook users goes a step further, to turn their proxy of the news into a simulation of it. Clearly this was an attempt to best Twitter, may also have played into their effort to persuade news organizations to partner with them and take advantage of their platform as a means of distribution. But it also encouraged us to hold Trends accountable for news-like concerns, like liberal bias.

We could think about Trends differently, not as approximating the news but as taking the public’s pulse. If Trends were designed to strictly represent “what are Facebook users talking about a lot,” presumably there is some scientific value, or at least cultural interest, in knowing what (that many) people are actually talking about. If that were its understood value, we might still worry about the intervention of human curators and their political preferences, but not because their choices would shape users’  political knowledge or attitudes, but because e’d want this scientific glimpse to be unvarnished by misrepresentation.

But that is not how Facebook has invited us to think about its Trending topics, and it couldn’t do so if it wanted: its interest in Trending topics is neither as a form of news production nor as a pulse of the public, but as a means to keep users on the site and involved. The proof of this, and the detail that so often gets forgotten in these debates, is that the Trending Topics are personalized. Here’s Facebook’s own explanation: “Trending shows you a list of topics and hashtags that have recently spiked in popularity on Facebook. This list is personalized based on a number of factors, including Pages you’ve liked, your location and what’s trending across Facebook.” Knowing what has “spiked in popularity” is not the same as news; a list “personalized based on… Pages you’ve liked” is no longer a site-wide measure of popular activity; an injected topic is no longer just what an algorithm identified.

As I’ve said elsewhere, “trends” are not a barometer of popular activity but a hieroglyph, making provocative but oblique and fleeting claims about “us” but invariably open to interpretation. Today’s frustration with Facebook, focused for the moment on the role their news curators might have played in producing these Trends, is really a discomfort with the power Facebook seems to exert — a kind of power that’s hard to put a finger on, a kind of power that our traditional vocabulary fails to capture. But across the controversies that seem to flare again and again, a connecting thread is Facebook’s insistence on colonizing more and more components of social life (friendship, community, sharing, memory, journalism), and turning the production of shared meaning so vital to sociality into the processing of information so essential to their own aims.

Facebook Trending: It’s made of people!! (but we should have already known that)

Gizmodo has released two important articles (1, 2) about the people who were hired to manage Facebook’s “Trending” list. The first reveals not only how Trending topics are selected and packaged on Facebook, but also the peculiar working conditions this team experienced, the lack of guidance or oversight they were provided, and the directives they received to avoid news that addressed Facebook itself. The second makes a more pointed allegation: that along the way, conservative topics were routinely ignored, meaning the trending algorithm had identified user activity around a particular topic, but the team of curators chose not to publish it as a trend.

This is either a boffo revelation, or an unsurprising look at how the sausage always gets made, depending on your perspective. The promise of “trends” is a powerful one. Even as the public gets more and more familiar with the way social media platforms work with data, and even with more pointed scrutiny of trends in particular, it is still easy to think that “trends” means an algorithm is systematically and impartially uncovering genuine patterns of user activity. So, to discover that a handful of j-school graduates were tasked with surveying all the topics the algorithm identified, choosing just a handful of them, and dressing them up with names and summaries, feels like a unwelcome intrusion of human judgment into what we wish were analytic certainty. Who are these people? What incredible power they have to dictate what is and is not displayed, what is and is not presented as important! Wasn’t this  supposed to be just a measure of what users were doing, what the people important! Downplaying conservative news is the most damning charge possible, since it has long been a commonplace accusation leveled at journalists. But the revelation is that there’s people in the algorithm at all.

But the plain fact of information algorithms like the ones used to identify “trends” is that they do not work alone, they cannot work alone — in so many ways that we must simply discard the fantasy that they do, or ever will. In fact, algorithms do surprisingly little, they just do it really quickly and with a whole lot of data. Here’s some of what they can’t do:

Trending algorithms identify patterns in data, but they can’t make sense of it. The raw data is Facebook posts, likes, and hashtags. Looking at this data, there will certainly be surges of activity that can be identified and quantified: words that show up more than other words, posts that get more likes than other posts. But there is so much more to figure out
(1) What is a topic? To decide how popular a topic is, Facebook must decide which posts are about that topic. When do two posts or two hashtags represent the same story, such that they should be counted together? An algorithm can only do so much to say whether a post about Beyonce and a post about Bey and a post about Lemonade and a post about QueenB and the hashtag BeyHive are all the same topic. And that’s an easy one, a superstar with a distinctive name, days after a major public event. Imagine trying to determine algorithmically if people are talking about European tax reform, enough to warrant calling it a trend.
(2) Topics are also composed of smaller topics, endlessly down to infinity. Is the Republican nomination process a trending topic, or the Indiana primary, or Trump’s win in Indiana, or Paul Ryan’s response to Trump’s win in Indiana? According to one algorithmic threshold these would be grouped together, by another would be separate. The problem is not that an algorithm can’t tell. It’s that it can tell both interpretations, all interpretations equally well. So, an algorithm could be programmed to decide,to impose a particular threshold for the granularity of topics. But would that choice make sense to readers, would it map onto their own sense of what’s important, and would it work for the next topic, and the next?
(3) How should a topic be named and described, in a way that Facebook users would appreciate or even understand? Computational attempts to summarize are notoriously clunky, and often produce the kind of phrasing and grammar that scream “a computer wrote this.”
What trending algorithms can identify isn’t always what a platform wants to identify. Facebook, unlike Twitter, chose to display trends that identify topics, rather than single hashtags. This was already a move weighted towards identifying “news” rather than topics. It already strikes an uneasy balance between the kind of information they have — billions and posts and likes surging through their system — and the kind they’d like to display — a list of the most relevant topics. And it already sets up an irreconcilable tension: what should they do when user activity is not a good measure of public importance? It is not surprising the, that they’d try to focus on articles being circulated and commented on, and from the most reputable sources, as a way to lean on their curation and authority to pre-identify topics. Which opens up, as Gizmodo identifies, the tendency to discount some sources as non-reputable, which can have unintentionally partisan implications.
“Trending” is also being asked to do a lot of things for Facebook: capture the most relevant issues being discussed on Facebook, and conveniently map onto the most relevant topics in the worlds of news and entertainment, and keep users on the site longer, and keep up with Twitter, and keep advertisers happy. In many ways, a trending algorithm can be an enormous liability, if allowed to be: it could generate a list of dreadful or depressing topics; it could become a playground for trolls who want to fill it with nonsense and profanity; it could reveal how little people use Facebook to talk about matters of public importance; it could reveal how depressingly little people care about matters of public importance; and it could help amplify a story critical of Facebook itself. It would take a whole lot of bravado to set that loose on a system like Facebook, and let it show what it shows unmanaged. Clearly, Facebook has a lot more at stake in producing a trending list that, while it should look like an unvarnished report of what users are discussing, must also massage it into something that represents Facebook well at the same time.

So: people are in the algorithm because how could they not be? People produce the Facebook activity being measured, people design the algorithms and set their evaluative criteria, people decide what counts as a trend, people name and summarize them, and people look to game the algorithm with their next posts.

The thing is, these human judgments are all part of traditional news gathering as well. Choosing what to report in the news, how to describe it and feature it, and how to honor both the interests of the audience and the sense of importance, has always been a messy, subjective process, full of gaps in which error, bias, self-interest, and myopia can enter. The real concern here is not that there are similar gaps in Facebook’s process as well, or that Facebook hasn’t yet invented an algorithm that can close those gaps. The real worry is that Facebook is being so unbelievably cavalier about it.

Traditional news organizations face analogous problems and must make analogous choices, and can make analogous missteps. And they do. But two countervailing forces work against this, keep them more honest than not, more on target than not: a palpable and institutionalized commitment to news itself, and competition. I have no desire to glorify the current news landscape, which in many ways produces news that is disheartening less than what journalism should be. But there is at least a public, shared, institutionally rehearsed, and historical sense of purpose and mission, or at least there’s one available. Journalism schools teach their students about not just how to determine and deliver the news, but why. They offer up professional guidelines and heroic narratives that position the journalist as a provider of political truths and public insight. They provide journalists with frames that help them identify the way news can suffer when it overlaps with public relations, spin, infotainment, and advertising. There are buffers in place to protect journalists from the pressures that can come from the upper management, advertisers, or newsmakers themselves, because of a belief that independence is an important foundation for newsgathering. Journalists recognize that their choices have consequences, and they discuss those choices. And there are stakeholders for regularly checking these efforts for possible bias and self-interest: public editors and ombudspeople, newswatch organizations and public critics,  all trying to keep the process honest. Most of all, there are competitors who would gleefully point out a news organization’s mistakes and failures, which gives editors and managers real incentive to work against the temptations to produce news that is self-serving, politically slanted, or commercially craven.

Facebook seemed to have thought of absolutely none of these. Based on the revelations in the two Gizmodo articles, it’s clear that they hired a shoestring team, lashed them to the algorithm, offered little guidance for what it meant to make curatorial choices, provided no ongoing oversight as the project progressed, imposed self-interested guidelines to protect the company, and kept the entire process inscrutable to the public, cloaked in the promise of an algorithm doing its algorithm thing.

The other worry here is that Facebook is engaged in a labor practice increasingly common among Silicon Valley: hiring information workers through third parties, under precarious conditions and without access to the institutional support or culture their full-time employees enjoy, and imposing time and output demands on them that can only fail a task that warrants more time, care, expertise, and support. This is the troubling truth about information workers in Silicon Valley and around the world, who find themselves “automated” by the gig economy — not just clickworkers on Mechanical Turk and drivers on Uber, but even “inside” the biggest and most established companies on the plant. It also is a dangerous tendency for the kind and scale of information projects that tech companies are willing to take on, without having the infrastructure and personnel to adequately support them. It is not uncommon now for a company to debut a new feature or service, only weeks in development and supported only by its design team, with the assumption that it can quickly hire and train a team of independent, hourly workers. Not only does this put a huge onus on those workers, but it means that, if the service finds users and begins to scales up quickly, little preparation was in place, and the overworked team must quickly make some ad hoc decisions about what are often tricky cases with real, public ramifications.

Trending algorithms are undeniably becoming part of the cultural landscape, and revelations like Gizmodo’s are helpful steps in helping us shed the easy notions of what they are and how they work, notions the platforms have fostered. Social media platforms must come to fully realize that they are newsmakers and gatekeepers, whether they intend to be or not, whether they want to be or not. And while algorithms can chew on a lot of data, it is still a substantial, significant, and human process to turn that data into claims about importance that get fed back to millions of users. This is not a realization that they will ever reach on their own — which suggests to me that they need the two countervailing forces that journalism has: a structural commitment to the public, imposed if not inherent, and competition to force them to take such obligations seriously.

Addendum: Techcrunch is reporting that Facebook has responded to Gizmodo’s allegations, suggesting that it has “rigorous guidelines in place for the review team to ensure consistency and neutrality.” This makes sense. But consistency and neutrality are fine as concepts, but they’re vague and insufficient in practice. There could have been Trending curators at Facebook who deliberately tanked conservative topics and knew that doing so violated policy. But (and this has long been known in the sociology of news) the greater challenge in producing the news, whether generating it or just curating it, is how to deal with the judgments that happen while being consistent and neutral. Making the news always requires judgments, and judgements always incorporate premises for assessing the relevance, legitimacy, and coherence of a topic. Recognizing bias in our own choices or across an institution is extremely difficult, but knowing whether you have produced a biased representation of reality is nearly impossible, as there’s nothing to compare it to — even setting aside that Facebook is actually trying to do something even harder, produce a representation of the collective representations of reality of their users, and ensure that somehow it also represents reality, as other reality-representers (be they CNN or Twitter users) have represented it. Were social media platforms willing to acknowledge that they constitute public life rather than hosting or reflecting it, they might look to those who produce news, educate journalists, and study news as a sociological phenomenon, for help thinking through these challenges.

Addendum 2 (May 9): The Senate Committee on Commerce, Science, and Transportation has just filed an inquiry with Facebook, raising concerns about their Trending Topics based on the allegations in the Gizmodo report. The letter of inquiry is available here, and has been reported by Gizmodo and elsewhere. In the letter they ask Mark Zuckerberg and Facebook to respond to a series of questions about how Trending Topics works, what kind of guidelines and oversight they provided, and whether specific topics were sidelined or injected. Gizmodo and other sites are highlighting the fact that this Committee is run by a conservative and has a majority of members who are conservative. But the questions posed are thoughtful ones. What they make so clear is that we simply do not have a vocabulary with which to hold these services accountable. For instance, they ask “Have Facebook news curators in fact manipulated the content of the Trending Topics section, either by targeting news stories related to conservative views for exclusion or by injecting non-trending content?” Look at the verbs. “Manipulated” is tricky, as it’s not exactly clear what the unmanipulated Trending Topics even are. “Targeting” sounds like they excluded stories, when what Gizmodo reports is that some stories were not selected as trending, or not recognized as stories. If trending algorithms can only highlight possible topics surging in popularity, but Facebook and its news curators constitute that data into a list of topics, then language that takes trending to be a natural phenomenon, that Facebook either accurately reveals or manipulates, can’t quite grip how this works and why it is so important. It is worth noting, though, that the inquiry pushes on how (whether) Facebook is keeping records of what is selected: “Does Facebook maintain a record ,of curators’ decisions to inject a story into the Trending Topics section or target a story for removal? If such a record. is not maintained, can such decisions be reconstructed or determined based on an analysis of the Trending Topics product? a. If so, how many stories have curators excluded that represented conservative viewpoints or topics of interest to conservatives? How many stories did curators inject that were not, in fact, trending? b. Please provide a list of all news stories removed from or injected into the Trending Topics section since January 2014.” This approach I think does emphasize to Facebook that these choices are significant, enough so that they should be treated as part of the public record and open to scrutiny by policymakers or the courts. This is a way of demanding Facebook take role in this regard more seriously.

#trendingistrending: when algorithms become culture

trendingistrending_frontpage_Page_01I wanted to share a new essay, “#Trendingistrending: When Algorithms Become Culture” that I’ve just completed for a forthcoming Routledge anthology called Algorithmic Cultures: Essays on Meaning, Performance and New Technologies, edited by Robert Seyfert and Jonathan Roberge. My aim is to focus on the various “trending algorithms” that populate social media platforms, consider what they do as a set, and then connect them to a broader history of metrics used in popular media, to both assess audience tastes and portray them back to that audience, as a cultural claim in its own right and as a form of advertising.

The essay is meant to extend the idea of “calculated publics” I first discussed here and the concerns that animated  this paper. But more broadly I hope it pushes us to think about algorithms not as external forces on the flow of popular culture, but increasingly as elements of popular culture themselves, something we discuss as culturally relevant, something we turn to face so as to participate in culture in particular ways. It also has a bit more to say about how we tend to think about and talk about “algorithms” in this scholarly discussion, something I have more to say about here.

I hope it’s interesting, and I really welcome your feedback. I already see places where I’ve not done the issue justice: I should connect the argument more to discussions of financial metrics, like credit ratings, as another moment when institutions have reason to turn such measures back as meaningful claims. I found the excellent essay (journal; academia.edu), where Jeremy Morris writes about what he calls “infomediaries,” late in my process, so while I do gesture to it, it could have informed my thinking even more. There are a dozen other things I wanted to say, and the essay is already a little overstuffed.

I do have some opportunity to make specific changes before it goes to press, so I’d love to hear any suggestions, if you’re inclined to read it.

“Critical algorithm studies” reading list

Nick Seaver and I have put together a list we wanted to share. It is an attempt to collect and categorize a growing critical literature on algorithms as social concerns. The work spans sociology, anthropology, science and technology studies, geography, communication, media studies, and legal studies, among others. Our aim was to catalog the emergence of “algorithms” as objects of interest for disciplines beyond mathematics, computer science, and software engineering.

This area is growing in size and popularity so quickly that new contributions are often popping up without reference to work from disciplinary neighbors. One aim of this list is to help nascent scholars of algorithms to identify broader conversations across disciplines, and to avoid reinventing the wheel or falling into analytic traps that other scholars have already identified. We also thought it would be useful, especially for those teaching these materials, to try to loosely categorize it. The organization of the list is meant merely as a first-pass, provisional sense-making effort. Within categories the entries are offered in chronological order, to help make sense of these rapid developments.

In light of this, we encourage you to see it as an unfinished document. There are 132 citations on the list, but we suspect there are many more to include. We very much welcome comments, including recommendations of other work to include, suggestions on how to reclassify a particular entry, or ideas for reorganizing the categories themselves. Please use the comment space at the bottom of the page itself; we will try to update the list in light of your suggestions.

https://socialmediacollective.org/reading-lists/critical-algorithm-studies/

or

http://bit.ly/crit-algo

A Research Agenda for Accountable Algorithms

What should people who are interested in accountability and algorithms be thinking about? Here is one answer: My eleven-minute remarks are now online from a recent event at NYU. I’ve edited them to intersperse my slides.

This talk was partly motivated by the ethics work being done in the machine learning community. That is very exciting and interesting work and I love, love, love it. My remarks are an attempt to think through the other things we might also need to do. Let me know how to replace the “??” in my slides with something more meaningful!

Preview: My remarks contain a minor attempt at a Michael Jackson joke.

 

 

A number of fantastic Social Media Collective people were at this conference — you can hear Kate Crawford in the opening remarks.  For more videos from the conference, see:

Algorithms and Accountability
http://www.law.nyu.edu/centers/ili/algorithmsconference

Thanks to Joris van Hoboken, Helen Nissenbaum and Elana Zeide for organizing such a fab event.

If you bought this 11-minute presentation you might also buy: Auditing Algorithms, a forthcoming workshop at Oxford.

http://auditingalgorithms.wordpress.com

 

 

(This was cross-posted to multicast.)

The Facebook “It’s Not Our Fault” Study

Today in Science, members of the Facebook data science team released a provocative study about adult Facebook users in the US “who volunteer their ideological affiliation in their profile.” The study “quantified the extent to which individuals encounter comparatively more or less diverse” hard news “while interacting via Facebook’s algorithmically ranked News Feed.”*

  • The research found that the user’s click rate on hard news is affected by the positioning of the content on the page by the filtering algorithm. The same link placed at the top of the feed is about 10-15% more likely to get a click than a link at position #40 (figure S5).
  • The Facebook news feed curation algorithm, “based on many factors,” removes hard news from diverse sources that you are less likely to agree with but it does not remove the hard news that you are likely to agree with (S7). They call news from a source you are less likely to agree with “cross-cutting.”*
  • The study then found that the algorithm filters out 1 in 20 cross-cutting hard news stories that a self-identified conservative sees (or 5%) and 1 in 13 cross-cutting hard news stories that a self-identified liberal sees (8%).
  • Finally, the research then showed that “individuals’ choices about what to consume” further limits their “exposure to cross-cutting content.” Conservatives will click on only 17% a little less than 30% of cross-cutting hard news, while liberals will click 7% a little more than 20% (figure 3).

My interpretation in three sentences:

  1. We would expect that people who are given the choice of what news they want to read will select sources they tend to agree with–more choice leads to more selectivity and polarization in news sources.
  2. Increasing political polarization is normatively a bad thing.
  3. Selectivity and polarization are happening on Facebook, and the news feed curation algorithm acts to modestly accelerate selectivity and polarization.

I think this should not be hugely surprising. For example, what else would a good filter algorithm be doing other than filtering for what it thinks you will like?

But what’s really provocative about this research is the unusual framing. This may go down in history as the “it’s not our fault” study.

Facebook: It’s not our fault.

I carefully wrote the above based on my interpretation of the results. Now that I’ve got that off my chest, let me tell you about how the Facebook data science team interprets these results. To start, my assumption was that news polarization is bad.  But the end of the Facebook study says:

“we do not pass judgment on the normative value of cross-cutting exposure”

This is strange, because there is a wide consensus that exposure to diverse news sources is foundational to democracy. Scholarly research about social media has–almost universally–expressed concern about the dangers of increasing selectivity and polarization. But it may be that you do not want to say that polarization is bad when you have just found that your own product increases it. (Modestly.)

And the sources cited just after this quote sure do say that exposure to diverse news sources is important. But the Facebook authors write:

“though normative scholars often argue that exposure to a diverse ‘marketplace of ideas’ is key to a healthy democracy (25), a number of studies find that exposure to cross-cutting viewpoints is associated with lower levels of political participation (22, 26, 27).”

So the authors present reduced exposure to diverse news as a “could be good, could be bad” but that’s just not fair. It’s just “bad.” There is no gang of political scientists arguing against exposure to diverse news sources.**

The Facebook study says it is important because:

“our work suggests that individuals are exposed to more cross-cutting discourse in social media they would be under the digital reality envisioned by some

Why so defensive? If you look at what is cited here, this quote is saying that this study showed that Facebook is better than a speculative dystopian future.*** Yet the people referred to by this word “some” didn’t provide any sort of point estimates that were meant to allow specific comparisons. On the subject of comparisons, the study goes on to say that:

“we conclusively establish that…individual choices more than algorithms limit exposure to attitude-challenging content.”

compared to algorithmic ranking, individuals’ choices about what to consume had a stronger effect”

Alarm bells are ringing for me. The tobacco industry might once have funded a study that says that smoking is less dangerous than coal mining, but here we have a study about coal miners smoking. Probably while they are in the coal mine. What I mean to say is that there is no scenario in which “user choices” vs. “the algorithm” can be traded off, because they happen together (Fig. 3 [top]). Users select from what the algorithm already filtered for them. It is a sequence.**** I think the proper statement about these two things is that they’re both bad — they both increase polarization and selectivity. As I said above, the algorithm appears to modestly increase the selectivity of users.

The only reason I can think of that the study is framed this way is as a kind of alibi. Facebook is saying: It’s not our fault! You do it too!

Are we the 4%?

In my summary at the top of this post, I wrote that the study was about people “who volunteer their ideological affiliation in their profile.” But the study also describes itself by saying:

“we utilize a large, comprehensive dataset from Facebook.”

“we examined how 10.1 million U.S. Facebook users interact”

These statements may be factually correct but I found them to be misleading. At first, I read this quickly and I took this to mean that out of the at least 200 million Americans who have used Facebook, the researchers selected a “large” sample that was representative of Facebook users, although this would not be representative of the US population. The “limitations” section discusses the demographics of “Facebook’s users,” as would be the normal thing to do if they were sampled. There is no information about the selection procedure in the article itself.

Instead, after reading down in the appendices, I realized that “comprehensive” refers to the survey research concept: “complete,” meaning that this was a non-probability, non-representative sample that included everyone on the Facebook platform. But out of hundreds of millions, we ended up with a study of 10.1m because users were excluded unless they met these four criteria:

  1. “18 or older”
  2. “log in at least 4/7 days per week”
  3. “have interacted with at least one link shared on Facebook that we classified as hard news”
  4. “self-report their ideological affiliation” in a way that was “interpretable”

That #4 is very significant. Who reports their ideological affiliation on their profile?

add your political views

It turns out that only 9% of Facebook users do that. Of those that report an affiliation, only 46% reported an affiliation in a way that was “interpretable.” That means this is a study about the 4% of Facebook users unusual enough to want to tell people their political affiliation on the profile page. That is a rare behavior.

More important than the frequency, though, is the fact that this selection procedure confounds the findings. We would expect that a small minority who publicly identifies an interpretable political orientation to be very likely to behave quite differently than the average person with respect to consuming ideological political news.  The research claims just don’t stand up against the selection procedure.

But the study is at pains to argue that (italics mine):

“we conclusively establish that on average in the context of Facebook, individual choices more than algorithms limit exposure to attitude-challenging content.”

The italicized portion is incorrect because the appendices explain that this is actually a study of a specific, unusual group of Facebook users. The study is designed in such a way that the selection for inclusion in the study is related to the results. (“Conclusively” therefore also feels out of place.)

Algorithmium: A Natural Element?

Last year there was a tremendous controversy about Facebook’s manipulation of the news feed for research. In the fracas it was revealed by one of the controversial study’s co-authors that based on the feedback received after the event, many people didn’t realize that the Facebook news feed was filtered at all. We also recently presented research with similar findings.

I mention this because when the study states it is about selection of content, who does the selection is important. There is no sense in this study that a user who chooses something is fundamentally different from the algorithm hiding something from them. While in fact the the filtering algorithm is driven by user choices (among other things), users don’t understand the relationship that their choices have to the outcome.

not sure if i hate facebook or everyone i know
In other words, the article’s strange comparison between “individual’s choices” and “the algorithm,” should be read as “things I choose to do” vs. the effect of “a process Facebook has designed without my knowledge or understanding.” Again, they can’t be compared in the way the article proposes because they aren’t equivalent.

I struggled with the framing of the article because the research talks about “the algorithm” as though it were an element of nature, or a naturally occurring process like convection or mitosis. There is also no sense that it changes over time or that it could be changed intentionally to support a different scenario.*****

Facebook is a private corporation with a terrible public relations problem. It is periodically rated one of the least popular companies in existence. It is currently facing serious government investigations into illegal practices in many countries, some of which stem from the manipulation of its news feed algorithm. In this context, I have to say that it doesn’t seem wise for these Facebook researchers to have spun these data so hard in this direction, which I would summarize as: the algorithm is less selective and less polarizing. Particularly when the research finding in their own study is actually that the Facebook algorithm is modestly more selective and more polarizing than living your life without it.

Update: (6pm Eastern)

Wow, if you think I was critical have a look at these. It turns out I am the moderate one.

Eszter Hargittai from Northwestern posted on Crooked Timber that we should “stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously.” And (my favorite): “I thought Science was a serious peer-reviewed publication.”

Nathan Jurgenson from Maryland and Snapchat wrote on Cyborgology (“in a fury“) that Facebook is intentionally “evading” its own role in the production of the news feed. “Facebook cannot take its own role in news seriously.” He accuses the authors of using the “Big-N trick” to intentionally distract from methodological shortcomings. He tweeted that “we need to discuss how very poor corporate big data research gets fast tracked into being published.”

Zeynep Tufekci from UNC wrote on Medium that “I cannot remember a worse apples to oranges comparison” and that the key take-away from the study is actually the ordering effects of the algorithm (which I did not address in this post). “Newsfeed placement is a profoundly powerful gatekeeper for click-through rates.”

Update: (5/10)

A comment helpfully pointed out that I used the wrong percentages in my fourth point when summarizing the piece. Fixed it, with changes marked.

Update: (5/15)

It’s now one week since the Science study. This post has now been cited/linked in The New York Times, Fortune, Time, Wired, Ars Technica, Fast Company, Engaget, and maybe even a few more. I am still getting emails. The conversation has fixated on the <4% sample, often saying something like: "So, Facebook said this was a study about cars, but it was actually only about blue cars.” That’s fine, but the other point in my post is about what is being claimed at all, no matter the sample.

I thought my “coal mine” metaphor about the algorithm would work but it has not always worked. So I’ve clamped my Webcam to my desk lamp and recorded a four-minute video to explain it again, this time with a drawing.******

If the coal mine metaphor failed me, what would be a better metaphor? I’m not sure. Suggestions?

 

 

Notes:

* Diversity in hard news, in their study, would be a self-identified liberal who receives a story from FoxNews.com, or a self-identified conservative who receives one from the HuffingtonPost.com, where the stories are about “national news, politics, [or] world affairs.” In more precise terms, for each user “cross-cutting content” was defined as stories that are more likely to be shared by partisans who do not have the same self-identified ideological affiliation that you do.

** I don’t want to make this even more nitpicky, so I’ll put this in a footnote. The paper’s citations to Mutz and Huckfeldt et al. to mean that “exposure to cross-cutting viewpoints is associated with lower levels of political participation” is just bizarre. I hope it is a typo. These authors don’t advocate against exposure to cross-cutting viewpoints.

*** Perhaps this could be a new Facebook motto used in advertising: “Facebook: Better than one speculative dystopian future!”

**** In fact, algorithm and user form a coupled system of at least two feedback loops. But that’s not helpful to measure “amount” in the way the study wants to, so I’ll just tuck it away down here.

***** Facebook is behind the algorithm but they are trying to peer-review research about it without disclosing how it works — which is a key part of the study. There is also no way to reproduce the research (or do a second study on a primary phenomenon under study, the algorithm) without access to the Facebook platform.

****** In this video, I intentionally conflate (1) the number of posts filtered and (2) the magnitude of the bias of the filtering. I did so because the difficulty with the comparison works the same way for both, and I was trying to make the example simpler. Thanks to Cedric Langbort for pointing out that “baseline error” is the clearest way of explaining this.

(This was cross-posted to multicast and Wired.)

The Google Algorithm as a Robotic Nose

Algorithms, in the view of author Christopher Steiner, are poised to take over everything.  Algorithms embedded in software are now everywhere: Netflix recommendations, credit scores, driving directions, stock trading, Google search, Facebook’s news feed, the TSA’s process to decide who gets searched, the Home Depot prices you are quoted online, and so on. Just a few weeks ago, Ashtan Soltani, the new Chief Technologist of the FTC, has said that “algorithmic transparency”  is his central priority for the US government agency that is tasked with administration of fairness and justice in trade. Commentators are worried that the rise of hidden algorithmic automation is leading to a problematic new “black box society.”

But given that we want to achieve these “transparent” algorithms, how would we do that? Manfred Broy, writing in the context of software engineering, has said that one of the frustrations of working with software is that it is “almost intangible.”  Even if we suddenly obtained the source code for anything we wanted (which is unlikely) it usually not clear what code is doing.  How can we begin to have a meaningful conversation about the consequences of “an algorithm” by achieving some broad, shared understanding of what it is and what it is doing?

06-Sandvig-Seeing-the-Sort-2014-WEB.jpg

 

The answer, even among experts, is that we use metaphor, cartoons, diagrams, and abstraction. As a small beginning to tackling this problem of representing the algorithm, this week I have a new journal article out in the open access journal Media-N, titled “Seeing the Sort.” In it, I try for a critical consideration of how we represent algorithms visually. From flowcharts to cartoons, I go through examples of “algorithm public relations,” meaning both how algorithms are revealed to the public and also what spin the visualizers are trying for.

The most fun of writing the piece was choosing the examples, which include The Algo-Rythmics (an effort to represent algorithms in dance), an algorithm represented as a 19th century grist mill, and this Google cartoon that represents its algorithm as a robotic nose that smells Web pages:

The Google algorithm as a robotic nose that smells Web pages.

Read the article:

Sandvig, Christian. (2015). Seeing the Sort: The Aesthetic and Industrial Defense of “The Algorithm.” Media-N. vol. 10, no. 1. http://median.newmediacaucus.org/art-infrastructures-information/seeing-the-sort-the-aesthetic-and-industrial-defense-of-the-algorithm/

(this was also cross-posted to multicast.)

 

Corrupt Personalization

(“And also Bud Light.”)

In my last two posts I’ve been writing about my attempt to convince a group of sophomores with no background in my field that there has been a shift to the algorithmic allocation of attention — and that this is important. In this post I’ll respond to a student question. My favorite: “Sandvig says that algorithms are dangerous, but what are the the most serious repercussions that he envisions?” What is the coming social media apocalypse we should be worried about?

google flames

This is an important question because people who study this stuff are NOT as interested in this student question as they should be. Frankly, we are specialists who study media and computers and things — therefore we care about how algorithms allocate attention among cultural products almost for its own sake. Because this is the central thing that we study, we don’t spend a lot of time justifying it.

And our field’s most common response to the query “what are the dangers?” often lacks the required sense of danger. The most frequent response is: “extensive personalization is bad for democracy.” (a.k.a. Pariser’s “filter bubble,” Sunstein’s “egocentric” Internet, and so on). This framing lacks a certain house-on-fire urgency, doesn’t it?

(sarcastic tone:) “Oh, no! I’m getting to watch, hear, and read exactly what I want. Help me! Somebody do something!”

Sometimes (as Hindman points out) the contention is the opposite, that Internet-based concentration is bad for democracy.  But remember that I’m not speaking to political science majors here. The average person may not be as moved by an abstract, long-term peril to democracy as the average political science professor. As David Weinberger once said after I warned about the increasing reliance on recommendation algorithms, “So what?” Personalization sounds like a good thing.

As a side note, the second most frequent response I see is that algorithms are now everywhere. And they work differently than what came before. This also lacks a required sense of danger! Yes, they’re everywhere, but if they are a good thing

So I really like this question “what are the the most serious repercussions?” because I think there are some elements of the shift to attention-sorting algorithms that are genuinely “dangerous.” I can think of at least two, probably more, and they don’t get enough attention. In the rest of this post I’ll spell out the first one which I’ll call “corrupt personalization.”

Here we go.

Common-sense reasoning about algorithms and culture tells us that the purveyors of personalized content have the same interests we do. That is, if Netflix started recommending only movies we hate or Google started returning only useless search results we would stop using them. However: Common sense is wrong in this case. Our interests are often not the same as the providers of these selection algorithms.  As in my last post, let’s work through a few concrete examples to make the case.

In this post I’ll use Facebook examples, but the general problem of corrupt personalization is present on all of our media platforms in wide use that employ the algorithmic selection of content.

(1) Facebook “Like” Recycling

Screen Shot 2012-12-10 at 12.44.34 PM

(Image from ReadWriteWeb.)

On Facebook, in addition to advertisements along the side of the interface, perhaps you’ve noticed “featured,” “sponsored,” or “suggested” stories that appear inside your news feed, intermingled with status updates from your friends. It could be argued that this is not in your interest as a user (did you ever say, “gee, I’d like ads to look just like messages from my friends”?), but I have bigger fish to fry.

Many ads on Facebook resemble status updates in that there can be messages endorsing the ads with “likes.” For instance, here is an older screenshot from ReadWriteWeb:

pages you may like on facebook

Another example: a “suggested” post was mixed into my news feed just this morning recommending World Cup coverage on Facebook itself. It’s a Facebook ad for Facebook, in other words.  It had this intriguing addendum:

CENSORED likes facebook

So, wait… I have hundreds of friends and eleven of them “like” Facebook?  Did they go to http://www.facebook.com and click on a button like this:

Facebook like button magnified

But facebook.com doesn’t even have a “Like” button!  Did they go to Facebook’s own Facebook page (yes, there is one) and click “Like”? I know these people and that seems unlikely. And does Nicolala really like Walmart? Hmmm…

What does this “like” statement mean? Welcome to the strange world of “like” recycling. Facebook has defined “like” in ways that depart from English usage.  For instance, in the past Facebook has determined that:

  1. Anyone who clicks on a “like” button is considered to have “liked” all future content from that source. So if you clicked a “like” button because someone shared a “Fashion Don’t” from Vice magazine, you may be surprised when your dad logs into Facebook three years later and is shown a current sponsored story from Vice.com like “Happy Masturbation Month!” or “How to Make it in Porn” with the endorsement that you like it. (Vice.com example is from Craig Condon [NSFW].)
  2. Anyone who “likes” a comment on a shared link is considered to “like” wherever that link points to.  a.k.a. “‘liking a share.” So if you see a (real) FB status update from a (real) friend and it says: “Yuck! The McLobster is a disgusting product idea!” and your (real) friend include a (real) link like this one — that means if you clicked “like” your friends may see McDonald’s ads in the future that include the phrase “(Your Name) likes McDonalds.” (This example is from ReadWriteWeb.)

fauxLike_mcdonalds

This has led to some interesting results, like dead people “liking” current news stories on Facebook.

There is already controversy about advertiser “like” inflation, “like” spam, and fake “likes,” — and these things may be a problem too, but that’s not what we are talking about here.  In the examples above the system is working as Facebook designed it to. A further caveat: note that the definition of “like” in Facebook’s software changes periodically and when they are sued. Facebook now has an opt-out setting for the above two “features.”

But these incendiary examples are exceptional fiascoes — on the whole the system probably works well. You likely didn’t know that your “like” clicks are merrily producing ads on your friends pages and in your name because you cannot see them.  These “stories” do not appear on your news feed and cannot be individually deleted.

Unlike the examples from my last post you can’t quickly reproduce these results with certainty on your own account. Still, if you want to try, make a new Facebook account under a fake name (warning! dangerous!) and friend your real account. Then use the new account to watch your status updates.

Why would Facebook do this? Obviously it is a controversial practice that is not going to be popular with users. Yet Facebook’s business model is to produce attention for advertisers, not to help you — silly rabbit. So they must have felt that using your reputation to produce more ad traffic from your friends was worth the risk of irritating you. Or perhaps they thought that the practice could be successfully hidden from users — that strategy has mostly worked!

In sum this is a personalization scheme that does not serve your goals, it serves Facebook’s goals at your expense.

(2) “Organic” Content

This second group of examples concerns content that we consider to be “not advertising,” a.k.a. “organic” content. Funnily enough, algorithmic culture has produced this new use of the word “organic” — but has also made the boundary between “advertising” and “not advertising” very blurry.

funny-organic-food-ad

 

The general problem is that there are many ways in which algorithms act as mixing valves between things that can be easily valued with money (like ads) and things that can’t. And this kind of mixing is a normative problem (what should we do) and not a technical problem (how do we do it).

For instance, for years Facebook has encouraged nonprofits, community-based organizations, student clubs, other groups, and really anyone to host content on facebook.com.  If an organization creates a Facebook page for itself, the managers can update the page as though it were a profile.

Most page managers expect that people who “like” that page get to see the updates… which was true until January of this year. At that time Facebook modified its algorithm so that text updates from organizations were not widely shared. This is interesting for our purposes because Facebook clearly states that it wants page operators to run Facebook ad campaigns, and not to count on getting traffic from “organic” status updates, as it will no longer distribute as many of them.

This change likely has a very differential effect on, say, Nike‘s Facebook page, a small local business‘s Facebook page, Greenpeace International‘s Facebook page, and a small local church congregation‘s Facebook page. If you start a Facebook page for a school club, you might be surprised that you are spending your labor writing status updates that are never shown to anyone. Maybe you should buy an ad. Here’s an analytic for a page I manage:

this week page likes facebook

 

The impact isn’t just about size — at some level businesses might expect to have to insert themselves into conversations via persuasive advertising that they pay for, but it is not as clear that people expect Facebook to work this way for their local church or other domains of their lives. It’s as if on Facebook, people were using the yellow pages but they thought they were using the white pages.  And also there are no white pages.

(Oh, wait. No one knows what yellow pages and white pages are anymore. Scratch that reference, then.)

No need to stop here, in the future perhaps Facebook can monetize my family relationships. It could suggest that if I really want anyone to know about the birth of my child, or I really want my “insightful” status updates to reach anyone, I should turn to Facebook advertising.

Let me also emphasize that this mixing problem extends to the content of our personal social media conversations as well. A few months back, I posted a Facebook status update that I thought was humorous. I shared a link highlighting the hilarious product reviews for the Bic “Cristal For Her” ballpoint pen on Amazon. It’s a pen designed just for women.

bic crystal for her

The funny thing is that I happened to look at a friend of mine’s Facebook feed over their shoulder, and my status update didn’t go away. It remained, pegged at the top of my friend’s news feed, for as long as 14 days in one instance. What great exposure for my humor, right? But it did seem a little odd… I queried my other friends on Facebook and some confirmed that the post was also pegged at the top of their news feed.

I was unknowingly participating in another Facebook program that converts organic status updates into ads. It does this by changing their order in the news feed and adding the text “Sponsored” in light gray, which is very hard to see. Otherwise at least some updates are not changed. I suspect Facebook’s algorithm thought I was advertising Amazon (since that’s where the link pointed), but I am not sure.

This is similar to Twitter’s “Promoted Tweets” but there is one big difference.  In the Facebook case the advertiser promotes content — my content — that they did not write. In effect Facebook is re-ordering your conversations with your friends and family on the basis of whether or not someone mentioned Coke, Levi’s, and Anheuser Busch (confirmed advertisers in the program).

Sounds like a great personal social media strategy there–if you really want people to know about your forthcoming wedding, maybe just drop a few names? Luckily the algorithms aren’t too clever about this yet so you can mix up the word order for humorous effect.

(Facebook status update:) “I am so delighted to be engaged to this wonderful woman that I am sitting here in my Michelob drinking a Docker’s Khaki Collection. And also Coke.”

Be sure to use links. I find the interesting thing about this mixing of the commercial and non-commercial to be that it sounds to my ears like some sort of corny, unrealistic science fiction scenario and yet with the current Facebook platform I believe the above example would work. We are living in the future.

So to recap, if Nike makes a Facebook page and posts status updates to it, that’s “organic” content because they did not pay Facebook to distribute it. Although any rational human being would see it as an ad. If my school group does the same thing, that’s also organic content, but they are encouraged to buy distribution — which would make it inorganic. If I post a status update or click “like” in reaction to something that happens in my life and that happens to involve a commercial product, my action starts out as organic, but then it becomes inorganic (paid for) because a company can buy my words and likes and show them to other people without telling me. Got it? This paragraph feels like we are rethinking CHEM 402.

The upshot is that control of the content selection algorithm is used by Facebook to get people to pay for things they wouldn’t expect to pay for, and to show people personalized things that they don’t think are paid for. But these things were in fact paid for.  In sum this is again a scheme that does not serve your goals, it serves Facebook’s goals at your expense.

The Danger: Corrupt Personalization

With these concrete examples behind us, I can now more clearly answer this student question. What are the most serious repercussions of the algorithmic allocation of attention?

I’ll call this first repercussion “corrupt personalization” after C. Edwin Baker. (Baker, a distinguished legal philosopher, coined the phrase “corrupt segmentation” in 1998 as an extension of the theories of philosopher Jürgen Habermas.)

Here’s how it works: You have legitimate interests that we’ll call “authentic.” These interests arise from your values, your community, your work, your family, how you spend your time, and so on. A good example might be that as a person who is enrolled in college you might identify with the category “student,” among your many other affiliations. As a student, you might be authentically interested in an upcoming tuition increase or, more broadly, about the contention that “there are powerful forces at work in our society that are actively hostile to the college ideal.”

However, you might also be authentically interested in the fact that your cousin is getting married. Or in pictures of kittens.

Grumpy-Cat-meme-610x405

Corrupt personalization is the process by which your attention is drawn to interests that are not your own. This is a little tricky because it is impossible to clearly define an “authentic” interest. However, let’s put that off for the moment.

In the prior examples we saw some (I hope) obvious places where my interests diverged from that of algorithmic social media systems. Highlights for me were:

  • When I express my opinion about something to my friends and family, I do not want that opinion re-sold without my knowledge or consent.
  • When I explicitly endorse something, I don’t want that endorsement applied to other things that I did not endorse.
  • If I want to read a list of personalized status updates about my friends and family, I do not want my friends and family sorted by how often they mention advertisers.
  • If a list of things is chosen for me, I want the results organized by some measure of goodness for me, not by how much money someone has paid.
  • I want paid content to be clearly identified.
  • I do not want my information technology to sort my life into commercial and non-commercial content and systematically de-emphasize the noncommercial things that I do, or turn these things toward commercial purposes.

More generally, I think the danger of corrupt personalization is manifest in three ways.

  1. Things that are not necessarily commercial become commercial because of the organization of the system. (Merton called this “pseudo-gemeinschaft,” Habermas called it “colonization of the lifeworld.”)
  2. Money is used as a proxy for “best” and it does not work. That is, those with the most money to spend can prevail over those with the most useful information. The creation of a salable audience takes priority over your authentic interests. (Smythe called this the “audience commodity,” it is Baker’s “market filter.”)
  3. Over time, if people are offered things that are not aligned with their interests often enough, they can be taught what to want. That is, they may come to wrongly believe that these are their authentic interests, and it may be difficult to see the world any other way. (Similar to Chomsky and Herman’s [not Lippman’s] arguments about “manufacturing consent.”)

There is nothing inherent in the technologies of algorithmic allocation that is doing this to us, instead the economic organization of the system is producing these pressures. In fact, we could design a system to support our authentic interests, but we would then need to fund it. (Thanks, late capitalism!)

To conclude, let’s get some historical perspective. What are the other options, anyway? If cultural selection is governed by computer algorithms now, you might answer, “who cares?” It’s always going to be governed somehow. If I said in a talk about “algorithmic culture” that I don’t like the Netflix recommender algorithm, what is supposed to replace it?

This all sounds pretty bad, so you might think I am asking for a return to “pre-algorithmic” culture: Let’s reanimate the corpse of Louis B. Mayer and he can decide what I watch. That doesn’t seem good either and I’m not recommending it. We’ve always had selection systems and we could even call some of the earlier ones “algorithms” if we want to.  However, we are constructing something new and largely unprecedented here and it isn’t ideal. It isn’t that I think algorithms are inherently dangerous, or bad — quite the contrary. To me this seems like a case of squandered potential.

With algorithmic culture, computers and algorithms are allowing a new level of real-time personalization and content selection on an individual basis that just wasn’t possible before. But rather than use these tools to serve our authentic interests, we have built a system that often serves a commercial interest that is often at odds with our interests — that’s corrupt personalization.

If I use the dominant forms of communication online today (Facebook, Google, Twitter, YouTube, etc.) I can expect content customized for others to use my name and my words without my consent, in ways I wouldn’t approve of. Content “personalized” for me includes material I don’t want, and obscures material that I do want. And it does so in a way that I may not be aware of.

This isn’t an abstract problem like a long-term threat to democracy, it’s more like a mugging — or at least a confidence game or a fraud. It’s violence being done to you right now, under your nose. Just click “like.”

In answer to your question, dear student, that’s my first danger.

* * *

ADDENDUM:

This blog post is already too long, but here is a TL;DR addendum for people who already know about all this stuff.

I’m calling this corrupt personalization because I cant just apply Baker’s excellent ideas about corrupt segments — the world has changed since he wrote them. Although this post’s reasoning is an extension of Baker, it is not a straightforward extension.

Algorithmic attention is a big deal because we used to think about media and identity using categories, but the algorithms in wide use are not natively organized that way. Baker’s ideas were premised on the difference between authentic and inauthentic categories (“segments”), yet segments are just not that important anymoreBermejo calls this the era of post-demographics.

Advertisers used to group demographics together to make audiences comprehensible, but it may no longer be necessary to buy and sell demographics or categories as they are a crude proxy for purchasing behavior. If I want to sell a Subaru, why buy access to “Brite Lights, Li’l City” (My PRIZM marketing demographic from the 1990s) when I can directly detect “intent to purchase a station wagon” or “shopping for a Subaru right now”? This complicates Baker’s idea of authentic segments quite a bit. See also Gillespie’s concept of calculated publics.

Also Baker was writing in an era where content was inextricably linked to advertising because it was not feasible to decouple them. But today algorithmic attention sorting has often completely decoupled advertising from content. Online we see ads from networks that are based on user behavior over time, rather than what content the user is looking at right now. The relationship between advertising support and content is therefore more subtle than in the previous era, and this bears more investigation.

Okay, okay I’ll stop now.

* * *

(This is a cross-post from Multicast.)

Show-and-Tell: Algorithmic Culture

Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.

I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.”  They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:

Three Ways to Demonstrate Algorithmic Culture

(1.) puppy dog hate (Google Instant)

You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter].  That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:

Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “).  Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.

What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].

This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.

It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.

Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)

This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.

Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.

(2.) Top Stories Fail (Facebook)

In this example, you’ll need a Facebook account.  Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)

Switch this toggle back and forth and look at how your feed changes.

What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.

We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”

Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)

It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)

What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.

There are deeper things to say about Facebook, but this is enough to start with. Onward.

(3.) Your DoubleClick Cookie Filling (DoubleClick)

This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google

Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct.  Here’s a screen shot:

If you have “interests” listed, click on “Edit” to see a list of topics.

What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.

What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.

This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.

This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).

Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”).  Right now it thinks I am interested in 27 things, some of them are:

  • Standardized & Admissions Tests (Yes.)
  • Roleplaying Games (Yes.)
  • Dishwashers (No.)
  • Dresses (No.)

You can also type in your own interests to save Google the trouble of profiling you.

Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.

These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast.  I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.

In my next post I’ll write about how the students reacted to all this.

(This was also cross-posted to multicast.)