Facebook Trending: It’s made of people!! (but we should have already known that)

Gizmodo has released two important articles (1, 2) about the people who were hired to manage Facebook’s “Trending” list. The first reveals not only how Trending topics are selected and packaged on Facebook, but also the peculiar working conditions this team experienced, the lack of guidance or oversight they were provided, and the directives they received to avoid news that addressed Facebook itself. The second makes a more pointed allegation: that along the way, conservative topics were routinely ignored, meaning the trending algorithm had identified user activity around a particular topic, but the team of curators chose not to publish it as a trend.

This is either a boffo revelation, or an unsurprising look at how the sausage always gets made, depending on your perspective. The promise of “trends” is a powerful one. Even as the public gets more and more familiar with the way social media platforms work with data, and even with more pointed scrutiny of trends in particular, it is still easy to think that “trends” means an algorithm is systematically and impartially uncovering genuine patterns of user activity. So, to discover that a handful of j-school graduates were tasked with surveying all the topics the algorithm identified, choosing just a handful of them, and dressing them up with names and summaries, feels like a unwelcome intrusion of human judgment into what we wish were analytic certainty. Who are these people? What incredible power they have to dictate what is and is not displayed, what is and is not presented as important! Wasn’t this  supposed to be just a measure of what users were doing, what the people important! Downplaying conservative news is the most damning charge possible, since it has long been a commonplace accusation leveled at journalists. But the revelation is that there’s people in the algorithm at all.

But the plain fact of information algorithms like the ones used to identify “trends” is that they do not work alone, they cannot work alone — in so many ways that we must simply discard the fantasy that they do, or ever will. In fact, algorithms do surprisingly little, they just do it really quickly and with a whole lot of data. Here’s some of what they can’t do:

Trending algorithms identify patterns in data, but they can’t make sense of it. The raw data is Facebook posts, likes, and hashtags. Looking at this data, there will certainly be surges of activity that can be identified and quantified: words that show up more than other words, posts that get more likes than other posts. But there is so much more to figure out
(1) What is a topic? To decide how popular a topic is, Facebook must decide which posts are about that topic. When do two posts or two hashtags represent the same story, such that they should be counted together? An algorithm can only do so much to say whether a post about Beyonce and a post about Bey and a post about Lemonade and a post about QueenB and the hashtag BeyHive are all the same topic. And that’s an easy one, a superstar with a distinctive name, days after a major public event. Imagine trying to determine algorithmically if people are talking about European tax reform, enough to warrant calling it a trend.
(2) Topics are also composed of smaller topics, endlessly down to infinity. Is the Republican nomination process a trending topic, or the Indiana primary, or Trump’s win in Indiana, or Paul Ryan’s response to Trump’s win in Indiana? According to one algorithmic threshold these would be grouped together, by another would be separate. The problem is not that an algorithm can’t tell. It’s that it can tell both interpretations, all interpretations equally well. So, an algorithm could be programmed to decide,to impose a particular threshold for the granularity of topics. But would that choice make sense to readers, would it map onto their own sense of what’s important, and would it work for the next topic, and the next?
(3) How should a topic be named and described, in a way that Facebook users would appreciate or even understand? Computational attempts to summarize are notoriously clunky, and often produce the kind of phrasing and grammar that scream “a computer wrote this.”
What trending algorithms can identify isn’t always what a platform wants to identify. Facebook, unlike Twitter, chose to display trends that identify topics, rather than single hashtags. This was already a move weighted towards identifying “news” rather than topics. It already strikes an uneasy balance between the kind of information they have — billions and posts and likes surging through their system — and the kind they’d like to display — a list of the most relevant topics. And it already sets up an irreconcilable tension: what should they do when user activity is not a good measure of public importance? It is not surprising the, that they’d try to focus on articles being circulated and commented on, and from the most reputable sources, as a way to lean on their curation and authority to pre-identify topics. Which opens up, as Gizmodo identifies, the tendency to discount some sources as non-reputable, which can have unintentionally partisan implications.
“Trending” is also being asked to do a lot of things for Facebook: capture the most relevant issues being discussed on Facebook, and conveniently map onto the most relevant topics in the worlds of news and entertainment, and keep users on the site longer, and keep up with Twitter, and keep advertisers happy. In many ways, a trending algorithm can be an enormous liability, if allowed to be: it could generate a list of dreadful or depressing topics; it could become a playground for trolls who want to fill it with nonsense and profanity; it could reveal how little people use Facebook to talk about matters of public importance; it could reveal how depressingly little people care about matters of public importance; and it could help amplify a story critical of Facebook itself. It would take a whole lot of bravado to set that loose on a system like Facebook, and let it show what it shows unmanaged. Clearly, Facebook has a lot more at stake in producing a trending list that, while it should look like an unvarnished report of what users are discussing, must also massage it into something that represents Facebook well at the same time.

So: people are in the algorithm because how could they not be? People produce the Facebook activity being measured, people design the algorithms and set their evaluative criteria, people decide what counts as a trend, people name and summarize them, and people look to game the algorithm with their next posts.

The thing is, these human judgments are all part of traditional news gathering as well. Choosing what to report in the news, how to describe it and feature it, and how to honor both the interests of the audience and the sense of importance, has always been a messy, subjective process, full of gaps in which error, bias, self-interest, and myopia can enter. The real concern here is not that there are similar gaps in Facebook’s process as well, or that Facebook hasn’t yet invented an algorithm that can close those gaps. The real worry is that Facebook is being so unbelievably cavalier about it.

Traditional news organizations face analogous problems and must make analogous choices, and can make analogous missteps. And they do. But two countervailing forces work against this, keep them more honest than not, more on target than not: a palpable and institutionalized commitment to news itself, and competition. I have no desire to glorify the current news landscape, which in many ways produces news that is disheartening less than what journalism should be. But there is at least a public, shared, institutionally rehearsed, and historical sense of purpose and mission, or at least there’s one available. Journalism schools teach their students about not just how to determine and deliver the news, but why. They offer up professional guidelines and heroic narratives that position the journalist as a provider of political truths and public insight. They provide journalists with frames that help them identify the way news can suffer when it overlaps with public relations, spin, infotainment, and advertising. There are buffers in place to protect journalists from the pressures that can come from the upper management, advertisers, or newsmakers themselves, because of a belief that independence is an important foundation for newsgathering. Journalists recognize that their choices have consequences, and they discuss those choices. And there are stakeholders for regularly checking these efforts for possible bias and self-interest: public editors and ombudspeople, newswatch organizations and public critics,  all trying to keep the process honest. Most of all, there are competitors who would gleefully point out a news organization’s mistakes and failures, which gives editors and managers real incentive to work against the temptations to produce news that is self-serving, politically slanted, or commercially craven.

Facebook seemed to have thought of absolutely none of these. Based on the revelations in the two Gizmodo articles, it’s clear that they hired a shoestring team, lashed them to the algorithm, offered little guidance for what it meant to make curatorial choices, provided no ongoing oversight as the project progressed, imposed self-interested guidelines to protect the company, and kept the entire process inscrutable to the public, cloaked in the promise of an algorithm doing its algorithm thing.

The other worry here is that Facebook is engaged in a labor practice increasingly common among Silicon Valley: hiring information workers through third parties, under precarious conditions and without access to the institutional support or culture their full-time employees enjoy, and imposing time and output demands on them that can only fail a task that warrants more time, care, expertise, and support. This is the troubling truth about information workers in Silicon Valley and around the world, who find themselves “automated” by the gig economy — not just clickworkers on Mechanical Turk and drivers on Uber, but even “inside” the biggest and most established companies on the plant. It also is a dangerous tendency for the kind and scale of information projects that tech companies are willing to take on, without having the infrastructure and personnel to adequately support them. It is not uncommon now for a company to debut a new feature or service, only weeks in development and supported only by its design team, with the assumption that it can quickly hire and train a team of independent, hourly workers. Not only does this put a huge onus on those workers, but it means that, if the service finds users and begins to scales up quickly, little preparation was in place, and the overworked team must quickly make some ad hoc decisions about what are often tricky cases with real, public ramifications.

Trending algorithms are undeniably becoming part of the cultural landscape, and revelations like Gizmodo’s are helpful steps in helping us shed the easy notions of what they are and how they work, notions the platforms have fostered. Social media platforms must come to fully realize that they are newsmakers and gatekeepers, whether they intend to be or not, whether they want to be or not. And while algorithms can chew on a lot of data, it is still a substantial, significant, and human process to turn that data into claims about importance that get fed back to millions of users. This is not a realization that they will ever reach on their own — which suggests to me that they need the two countervailing forces that journalism has: a structural commitment to the public, imposed if not inherent, and competition to force them to take such obligations seriously.

Addendum: Techcrunch is reporting that Facebook has responded to Gizmodo’s allegations, suggesting that it has “rigorous guidelines in place for the review team to ensure consistency and neutrality.” This makes sense. But consistency and neutrality are fine as concepts, but they’re vague and insufficient in practice. There could have been Trending curators at Facebook who deliberately tanked conservative topics and knew that doing so violated policy. But (and this has long been known in the sociology of news) the greater challenge in producing the news, whether generating it or just curating it, is how to deal with the judgments that happen while being consistent and neutral. Making the news always requires judgments, and judgements always incorporate premises for assessing the relevance, legitimacy, and coherence of a topic. Recognizing bias in our own choices or across an institution is extremely difficult, but knowing whether you have produced a biased representation of reality is nearly impossible, as there’s nothing to compare it to — even setting aside that Facebook is actually trying to do something even harder, produce a representation of the collective representations of reality of their users, and ensure that somehow it also represents reality, as other reality-representers (be they CNN or Twitter users) have represented it. Were social media platforms willing to acknowledge that they constitute public life rather than hosting or reflecting it, they might look to those who produce news, educate journalists, and study news as a sociological phenomenon, for help thinking through these challenges.

Addendum 2 (May 9): The Senate Committee on Commerce, Science, and Transportation has just filed an inquiry with Facebook, raising concerns about their Trending Topics based on the allegations in the Gizmodo report. The letter of inquiry is available here, and has been reported by Gizmodo and elsewhere. In the letter they ask Mark Zuckerberg and Facebook to respond to a series of questions about how Trending Topics works, what kind of guidelines and oversight they provided, and whether specific topics were sidelined or injected. Gizmodo and other sites are highlighting the fact that this Committee is run by a conservative and has a majority of members who are conservative. But the questions posed are thoughtful ones. What they make so clear is that we simply do not have a vocabulary with which to hold these services accountable. For instance, they ask “Have Facebook news curators in fact manipulated the content of the Trending Topics section, either by targeting news stories related to conservative views for exclusion or by injecting non-trending content?” Look at the verbs. “Manipulated” is tricky, as it’s not exactly clear what the unmanipulated Trending Topics even are. “Targeting” sounds like they excluded stories, when what Gizmodo reports is that some stories were not selected as trending, or not recognized as stories. If trending algorithms can only highlight possible topics surging in popularity, but Facebook and its news curators constitute that data into a list of topics, then language that takes trending to be a natural phenomenon, that Facebook either accurately reveals or manipulates, can’t quite grip how this works and why it is so important. It is worth noting, though, that the inquiry pushes on how (whether) Facebook is keeping records of what is selected: “Does Facebook maintain a record ,of curators’ decisions to inject a story into the Trending Topics section or target a story for removal? If such a record. is not maintained, can such decisions be reconstructed or determined based on an analysis of the Trending Topics product? a. If so, how many stories have curators excluded that represented conservative viewpoints or topics of interest to conservatives? How many stories did curators inject that were not, in fact, trending? b. Please provide a list of all news stories removed from or injected into the Trending Topics section since January 2014.” This approach I think does emphasize to Facebook that these choices are significant, enough so that they should be treated as part of the public record and open to scrutiny by policymakers or the courts. This is a way of demanding Facebook take role in this regard more seriously.

Shouting Fire in a Crowded Hashtag

Narco Censorship

The press is one of the many casualties of Mexico’s ongoing violence, in particular, the local media. Newspapers and TV stations are caught in a battle between censorship, control and threats from the drug cartels, and the local governments. In some cities, people often witness shootings, grenade attacks and other violent events, but when they try to find out what happened, their local news has nothing to offer. Some newspapers have officially announced a policy of self-censorship when it comes to reporting drug war-related news.

The result for a lot of Mexicans is that local media is no longer a source of news. Some citizens claim that their local news sources are paid off by the local government in an effort to minimize the violence; others argue that it is the cartels who have bribed them; while others, especially the journalists, say they are being threaten to stay quiet. What is certain is that journalists are being murdered and their murders often go unpunished.

Hashtags Save Lives

Knowing if there is a shooting going on in a certain part a city, is not just about satisfying one’s own curiosity, but about one’s safety. Many of the recent violent episodes in Mexico last long enough that knowing about them can be a life-saving piece of information. Since main stream media no longer fulfill its role of informing citizens about these events, people have turned to social media.

Twitter in particular, with its unidirectional follower model and its hashtags, has become one of the main sources of citizen-driven news in Mexico. People often “report”, “confirm” and re-tweet information about violent events using hashtags. In several cities, hashtags have emerged as shared news resources. One of the first cities where these hashtags were used was Reynosa, with #reynosafollow, followed by Monterrey with by #mtyfollow and, more recently, #verfollow for the coastal city of Veracruz.

A word count analysis of more than a quarter of a million tweets using the hashtag #mtyfollow over the course of nine months (11/2010 to 8/2011) shows how hashtags are used as a common resource. People hook into the hashtag to “report” (“REPORTAN”, in Spanish), issue warnings (“precaución”, “cuidado”) and request confirmation (“confirmar”) about shootings (“balacera”, “detonaciones” “balazos”) in certain areas of the city (“zona”, “Cumbres”, “Av”, “Sada”). You can also see the popularity of some user handles in the messages. Together, people such as @trackmty, @AnaRent, and @cicmty, have more than 85,000 followers and 65,000 tweets. These people have become reliable information news sources.

Most common words in 252,431 tweets using the hashtag #mtyfollow

Twitter Terrorists

Last Thursday at 11:56 AM, @gilius_22 tweeted a message using the #verfollow hashtag. He claimed that five kids were kidnapped at a school:

#verfollow I confirm that in the school ‘Jorge Arroyo’ in the Carranza neighborhood 5 kids were kidnapped, armed group, panic in the zone

The message was re-tweeted by twelve people, one of them was @VerFollow, a popular account with more than 5,000 followers that was created to report on the violence in the city. Immediately after these tweets, the rumor started spreading like wild fire. There were reports saying that one the of drug cartels was threatening to kill a child for each cartel member killed. People spread the news via Facebook, emails, and text messages. @gilius_22 reported that the cellphone network went down. Additionally, several other twitter users reported other incidents related to schools and to helicopters supposedly flying at low altitude.

By 12:00 pm (only four minutes later) the governor tweeted a message dismissing the rumor. However, by then it was either too late or the governor was not considered a reliable news source (probably a bit of both). Many parents rushed to to pick-up their children from school, causing massive traffic, chaos and panic across the city. Many parents did not take their kids to school the next day and businesses reported a 70% productivity loss due to the incident.

Mentions of the hashtag #verfollow in the month of August. Note the spike in Aug 25, the day of the rumors. Source: Topsy.

By 12:05 pm the governor tweeted his support for freedom of expression but urging people to make sure information is reputable before acting on it. Three hours later he posted that the government would go after those who spread the rumor on the basis of “terrorism”:

We have identified today’s misinformation sources, I want inform that this will have legal consequences according to Article 311 (terrorism)

Wikileaks-inspired logo of the anti-censorship movement in Veracruz.

The same day, the government website issued a statement listing sixteen twitter accounts involved in the rumor and threatening to take legal action against them. The statement also mentioned the name of the person associated with the account @gilius_22. By Saturday, @gilius_22 and @maruchibravo were arrested on charges of terrorism. Today, the total number of arrests has increased to three. Some of them have claimed to have been tortured by the police and forced to sign confessions. At the same time, many Twitter users across the country have rallied in opposition to the arrests. Many have mocked the government  by calling themselves “twitteroristas”. There is even an Anonymous video  “denouncing the government’s reaction against social media and the “lack courage” of the local media to report what is happening in the city.

Social Media Fail?

It is unclear what the motives and roles were of those sixteen people charged with spreading the rumor. Did they shout fire because they thought they saw flames or did they completely invent it? What led to the fast viral spread of this rumor?

The rumor would not have spread as easily if there was not already a widespread sentiment of vulnerability. It is unclear what did happen that day. There are several reports of military mobilizations around the same time of the tweets. If that was true, it probably added legitimacy to the rumors. Shouting fire in a theater carries a lot more weight that shouting fire in a pool.

The rumors spread faster because of a weak information “immune system.” Main stream media and the government are no longer considered reliable information sources in some of these cities. Social media has taken the role of the main stream media and that comes with its own challenges. Social media (i.e. Twitter) has fluid reputation mechanisms, which is positive because it helps protect people’s pseudonymity in light of the real danger faced by journalists. On the other hand, these fluid reputation mechanisms are problematic for assessing the reliability of information.

Many citizens do not trust the government. For example, the official Twitter account created by the local government to report violent events had six times less followers than some of the citizen journalists on Twitter. Many people claim that the government often downplays or completely denies the existence of any kind of violence under the motto “no pasa nada” (“nothing happens”). The governor himself has explicitly denied saying such thing:

I have never said that in Veracruz ‘nothing happens,’ we are fighting crime with all of our power so we can live in freedom, that is what is happening.

The circumstances were fertile ground for spreading misinformation. However, prosecuting Twitter users raises some questions. Yes, their actions caused panic, but does it actually amount to terrorism? Also, it is likely that these arrests will have a chilling effect on social media in Veracruz and maybe other cities, destroying citizen’s last resort for news. Another possible outcome is that social media might be pushed underground, making it even harder to develop reputation-building mechanisms.

If you found this interesting, you can follow me on Twitter or identi.ca.
Thanks to Nick Diakopolous for his feedback on this post.

UPDATE: Related interview on CNN and RWW article.

How much is a life worth in pixels?

Analysis of yesterday’s news coverage of the Mexican massacre

Mexican Tweets

More than fifty people were murdered yesterday in what is now the most violent episode in the ongoing Mexican Drug War. Most of the victims were women, some were pregnant. After learning about the horrific massacre in Monterrey, I spent several hours reading the reports coming from México via social and mainstream media. I exchanged messages with friends and family who live there (I went to college in Monterrey and my parents live no too far from there). The Twitter trending topics in México showed anger, desperation and hopelessness. One of the hashtags people often use to report violence in the city, #mtyfollow, was full of messages of repudiation and of people trying to help others find their loved ones. Some of the most retweeted messages were those with the names of the possible victims, as you can see in this chart.

 Twitter activity on a popular keyword right after the massacre
Mexican Twitter users helping find missing people after the massacre

American Silence

The massacre  happened only  140 miles south of Texas in one of the largest metropolitan areas in North America. Yet, as Nancy Baym put it,  the American twittersphere was mum. Why? In part, I think, because most of the news websites in the US were ignoring the event.

One could understand the lack of coverage in the first few hours. The news coming out of México were talking about “only” four deaths, so it is possible the events might not have caught the attention of the American news websites at first. However, ten hours after the attack the official number was already above fifty victims, with some reports as high as 61, yet sites like CNN.com gave little attention to the story. The link to the article of the massacre was buried among articles such as one about actress Rose McGowan’s childhood.

I know CNN is not known for its high-quality news coverage so I decided to check out one of America’s most trusted news outlets:  the New York Times.  I was disappointed, again.  I had to scroll all the way down to the “More News” section to find a 10 pixel-font link to the article titled “Arson Kills 40 in a Casino in Mexico.”

Pixels per Victim

Frustrated by this, I decided to get a more objective assessment of the coverage by counting the number of pixels different news websites were assigning to the story of the massacre. I know web designers put a lot of work into every single pixel on the screen, especially of high-traffic websites. Visitor’s attention is scarce and every pixel counts. So I took screenshots of  the front pages of some of the major news websites and calculated the amount of screen real state assigned to the story of the massacre. For example, the the New York Times, gave the story 291×11 pixels, a mere 0.27% of the screen real state (in a window size of 1439 x812 pixels). CNN gave it even less at 191×10 pixels, representing 0.16% of the screen. But what about other websites? Did any other websites in the English-world gave it more space? Yes. Read on.

I decided to look into non-American websites. If my calculations are correct, it turns out that Al-Jazeera and The Guardian alone gave more pixels to the story than CNN, the Washington Post, FOX News, the Wall Street Journal, the New York Times, MSNBC and the Houston Chronicle combined. Americans might be better off getting news about  their southern neighbor from a British or a Qatari website than from many of the US ones. The two exceptions were the LA Times and the Huffington Post. They both gave more pixels to the story than any other news source I analyzed. CNN was at the bottom of the list though. Click here for a slideshow of the websites I analyzed.

To summarize my results, I generated a ranking of the number of pixels per victim each news website devoted to the massacre. Yes, this issue is much more nuanced than pixels per victim, and I am not a journalism expert but I hope it can help start a discussion (or continue an existing one). If my calculations are correct, CNN devoted 38 pixels per victim, 76 times less than the LA Times which gave 2,920 pixels per victim.

Closing Thoughts

The Mexican Drug War is a complex geopolitical conflict closely linked to the United States’ financial stability  and national security. If American news websites do not give enough attention to the massacre of 50 people, what can we expect of less dramatic stories with perhaps more structural and long-term implications? I list here some of the recent related stories that I wish had gotten much more attention and that I hope you get to read to understand the complexity of the problem:

  1. The Guardian’s article on “How a big US bank laundered billions from Mexico’s murderous drug gangs.”
  2. The LA Times’ article on a senate report on how the “U.S. can’t justify its drug war spending” (there are many more articles about this).
  3. The NY Times story on how US-officials “allowed nearly 1,000 guns to flow illegally into Mexico” (also check this campaign to stop gun smuggling).
  4. Chomsky’s excellent synthesis of the whole Drug War problem  with a historical perspective that only Chomsky can give.

If you liked this, follow me on Twitter or identi.ca.