#trendingistrending: when algorithms become culture

trendingistrending_frontpage_Page_01I wanted to share a new essay, “#Trendingistrending: When Algorithms Become Culture” that I’ve just completed for a forthcoming Routledge anthology called Algorithmic Cultures: Essays on Meaning, Performance and New Technologies, edited by Robert Seyfert and Jonathan Roberge. My aim is to focus on the various “trending algorithms” that populate social media platforms, consider what they do as a set, and then connect them to a broader history of metrics used in popular media, to both assess audience tastes and portray them back to that audience, as a cultural claim in its own right and as a form of advertising.

The essay is meant to extend the idea of “calculated publics” I first discussed here and the concerns that animated  this paper. But more broadly I hope it pushes us to think about algorithms not as external forces on the flow of popular culture, but increasingly as elements of popular culture themselves, something we discuss as culturally relevant, something we turn to face so as to participate in culture in particular ways. It also has a bit more to say about how we tend to think about and talk about “algorithms” in this scholarly discussion, something I have more to say about here.

I hope it’s interesting, and I really welcome your feedback. I already see places where I’ve not done the issue justice: I should connect the argument more to discussions of financial metrics, like credit ratings, as another moment when institutions have reason to turn such measures back as meaningful claims. I found the excellent essay (journal; academia.edu), where Jeremy Morris writes about what he calls “infomediaries,” late in my process, so while I do gesture to it, it could have informed my thinking even more. There are a dozen other things I wanted to say, and the essay is already a little overstuffed.

I do have some opportunity to make specific changes before it goes to press, so I’d love to hear any suggestions, if you’re inclined to read it.

The 3 things you can learn about your neighborhood using Whooly

Along with my colleagues Shelly Farnham, and Michal Lahav—and our interns Yuheng HuEmma Spiro, and Nate Matias—we have been exploring ways of discovering and fostering latent neighborhood information to help people understand what’s happening in their local communities.

As part of this research, we have created Whooly an experimental mobile website that discovers and highlights neighborhood-specific information on Twitter in real-time. The system is focused, for now, on various neighborhoods of the Seattle metro area (King County to be specific). Whooly automatically discovers, extracts and summarizes hyperlocal Twitter content from these communities based on mentions of local neighborhoods and relevant keywords from tweets and profiles. One can think of Whooly as a neighborhood Twitter client.

Screenshot of Whooly

Continue reading “The 3 things you can learn about your neighborhood using Whooly”

The main Whoo.ly interface

Whoo.ly: Facilitating Information Seeking For Hyperlocal Communities Using Social Media

You hear sirens blaring in your neighborhood and, naturally, you are curious about the cause of commotion. Your first reaction might be to turn on the local TV news or go online and check the local newspaper. Unfortunately, unless the issue is of significant importance, your initial search of these media will be probably be fruitless. But, if you turn to social media, you are likely to find other neighbors reporting relevant information, giving firsthand accounts, or, at the very least, wondering what is going on as well.

Social media allows people to quickly spread information and, in urban environments, its presence is ubiquitous. However, social media is also noisy, chaotic, and hard to understand for those unfamiliar with, for example, the intricacies of hashtags and social media lingo. It should be no surprise that, regardless of the popularity of social media, people are still using TV and newspapers as their main sources for local information, while social media is just beginning to emerge as a useful information source.  We created Whoo.ly to address this issue.

Continue reading “Whoo.ly: Facilitating Information Seeking For Hyperlocal Communities Using Social Media”

Is Twitter us or them? #twitterfail and living somewhere between public commitment and private investment

This is about the fourth Olympics that’s been trumpeted as the first one to embrace social media and the Internet — just as, depending on how you figure it, it’s about the fourth U.S. election in a row that’s the first to go digital. It may be in the nature of new technologies that we appear perpetually, or at least for a very long time, to be just on the cusp of something. NBC has proudly trumpeted its online video streaming, its smartphone and tablet apps, and most importantly its partnership with microblogging platform Twitter. NBC regularly displays the #Olympics hashtag on the broadcasts, their coverage includes tweets and twit pics from athletes, and their website has made room for sport-specific Twitter streams.

It feels like an odd corporate pairing, at least from one angle. Twitter users have tweeted about past Olympics, for sure. But from a user’s perspective, its not clear what we need or get from a partnership with the broadcast network that’s providing exclusive coverage of the event. Isn’t Twitter supposed to be the place we talk about the things out there, the things we experience or watch or care about? But from another angle, it makes perfect sense. Twitter needs to reinforce the perception that it is the platform where chatter and commentary about what’s important to us should occur, and convince a broader audience to try it; it gets to do so here as “official narrator” of the Games. NBC needs ways to connect its coverage to the realm of social media, but without allowing anything digital to pre-empt its broadcasts. From a corporate perspective, interdependence is a successful economic strategy; from the users’ perspective, we want more independence between the two.

This makes the recent dustup about Twitter’s suspension of the account of Guy Adams, correspondent for The Independent (so perfect!), so troubling to so many. Adams had spent the first days of the Olympics criticizing NBC’s coverage of the games, particularly for time-delaying events to suit the U.S. prime time schedule, trimming the opening ceremony, and for some of the more inane commentary from NBC’s hosts. When Adams suggested that people should complain to Gary Zenkel, executive VP at NBC Sports and director of their Olympics coverage, and included Zenkel’s NBC email address, Twitter suspended his account.

Just to play out the details of the case, from the coverage that has developed thus far, we can say a couple of things. Twitter told Adams that his account had been suspended for “posting an individual’s private information such as private email address, physical address, telephone number, or financial documents.” Twitter asserts that it only considers rule violations if there is a complaint filed about them, suggesting that NBC had complained; in response, NBC says that Twitter brought the tweet (or tweets?) to NBC’s attention, who then submitted a complaint. Twitter has since reinstated Adams’ account, and reaffirmed the care and impartiality it takes in enforcing its rules.

Much of the conversation online, including on Twitter, has focused on two things: expressions of disappointment in Twitter for the perceived crime of shutting down a journalist’s account for criticizing a corporate partner, and a debate about whether Zenkel’s email should be considered public or private, and as such, making Twitter’s decision (despite its motivation) a legitimate or illegitimate interpretation of their own rules. This second question is an interesting one: Twitter’s rules not clarify the difference between the “private email addresses” they prohibit, and whatever the opposite is. Is Zenkel’s email address public because he’s a professional acting in a professional capacity? because it has appeared before on the web? Because it can be easily figured out (by the common firstname.lastname structure of NBC’s emails addresses? Alexis Madrigal at The Atlantic has a typically well-informed take on the issue.)

But I think this question of whether Twitter was appropriately acting on its own rules, and even the broader charge of whether its actions were motivated by their economic partnership with NBC, are both founded on a deeper question: what do we expect Twitter to be? This can be posed in naïve terms, as it often is in the heat of debate: are they an honorable supporter of free speech, or are they craven corporate shills? We may know these are exaggerated or untenable positions, both of them, but they’re still so appealing they continue to frame our debates. For example, in a widely circulated critique of Twitter’s decision, Jeff Jarvis proclaims that

For this incident itself is trivial, the fight frivolous. What difference does it make to the world if we complain about NBC’s tape delays and commentators’ ignorance? But Twitter is more than that. It is a platform. It is a platform that has been used by revolutionaries to communicate and coordinate and conspire and change the world. It is a platform that is used by journalists to learn and spread the news. If it is a platform it should be used by anyone for any purpose, none prescribed or prohibited by Twitter. That is the definition of a platform.

Adams himself titled his column for The Independent about the incident, “I thought the internet age had ended this kind of censorship.”

I want Jarvis and Adams to be right, here. But the reality is not so inspiring. We know that Twiiter is neither a militant guardian of free speech nor a glorified corporate billboard, that Twitter’s relationship to NBC and other commercial partners matters but does not determine, that Twitter is attempting to be a space for contentious speech and have rules of conduct that balance a many communities, values, and legal obligations. But exactly what we expect of Twitter in real contexts is imprecise, yet it matters for how we use it and how we grapple with a decision like the suspension of Adams’ account for the comments he made. And what these expectations are help to reveal, may even constitute, or experience of digital culture as a space for public, critical, political speech.

What if we put these possible expectations on a spectrum, if only so we can step away from the extremes on either end:

  • Social media are private services; we sign up for them. Their rules can be arbitrary, capricious, and self-serving if they choose. They can partner with content providers, including priviliging that content and protecting them from criticism. Users can take a walk if they don’t like it.
  • Social media are private services; we sign up for them. Their rules can be arbitrary and self-serving, but they should be fairly enforced. They can partner with content providers, including priviliging that content and protecting them from criticism, but they should be transparent about that promotion.
  • Social media are private services used by the public; Their rules are up to them, but should be justifiable and necessary; they should be fairly enforced, though taking into account the logistical challenges. They can partner with content providers, including priviliging that content, but they should be demarcate that content from what users produce.
  • Social media are private services used by the public; because of that public trust, those rules should balance honoring the public’s fair use of the network and protecting the service’s ability to function and profit; they should be fairly enforced, despite the logistical challenges. They can partner with content providers, including priviliging that content; they should be demarcate that content from what users produce.
  • Social media are private services and public platforms; because of that public trust, those rules should impartially honor the public’s fair use of the network; they should be fairly enforced, despite the logistical challenges. They can partner with sponsors that support this public forum through advertising, but it has a journalistic commitment to allow speech, even if its critical of its partners or of itself.
  • Social media are private but have become public platforms; the only rules it can set should be in the service of adhering to the law, and protecting the public forum itself from the harm users can do to it (such as hate speech). They can partner with sponsors that support this public forum through advertising, but it has a journalistic commitment to allow speech, even if its critical of its partners or of itself.
  • Social media are public platforms; and as such must have a deep commitment to free speech. While they can curtail the most egregious content under legal obligations, they should otherwise err on the side of allowing and protecting all speech, even when it is unruly, disrespectful, political contentious, or critical of itself. Sponsors and other corporate partnerships are nearly anathema to this mission, and should be constrained to the only the most cordoned off forms of advertising.
  • Social media should facilitate all speech and block none, no matter how reprehensible, offensive, dangerous, or illegal. Any commercial partnership is a suspicious distortion of this commitment. Users can take a walk if they don’t like it.

While the possibilities on the extreme ends of this spectrum may sound theoretically defensible to some, they are easily cast aside by test cases. Even the most ardent defender of free speech would pause if a platform allowed or defended the circulation of child pornography. And even the most ardent free market capitalist would recognize that a platform solely and capriciously in the service of its advertisers would undoubtedly fail as a public medium. What we’re left with, then, is the messier negotiations and compromises in the middle. Publicly, Twitter has leaned towards the public half of this spectrum: many celebrated when the company appealed court orders requiring them to reveal the identity of users involved in the Occupy protests, and Twitter has regularly celebrated itself for its role in protests and revolutions around the world. At the same time, they do have an array of rules that govern the use of their platform, rules that range from forbidding inappropriate content, limiting harassing or abusive behavior, prohibiting technical tricks that can garner more followers, establishing best practices for automated responders, and spelling out privacy violations. Despite their nominal (and in practice substantive) commitment to protecting speech, they are a private provider, that retains the rights and responsibilities to curate their user content according to rules they choose. This is the reality of platforms that we are reluctant to, but in the end must, accept.

What may be most uncharacteristic in the Adams case, and most troubling to Twitter’s critics, is not that Twitter enforced a vague rule, or did so when Adams was criticizing their corporate partner, in a way that, while scurrilous, was not illegal. It was that Twitter proactively identified Adams as a trouble spot for NBC — whether for his specific posting the Zenkel’s email or for the whole stream of criticism — and brought it to NBC’s attention. What Twitter did was to think like a corporate partner, not like a public platform. Of course it was within Twitter’s right to do so, and to suspend Adams’ account in response. And yes, there is a some risk of lost good will and public trust. But the suspension is an indication that, while Twitter’s rhetoric leans towards the claim of a public forum, their mindset about who they are and what purpose they serve remains enmeshed with their private status and their private investments than users might hope.

This is the tension lurking in Twitter’s apology about the incident, where they acknowledge that they had in fact alerted NBC about Adams’ post and encouraged therm to complain, then acted on that complaint. “This behavior is not acceptable and undermines the trust our users have in us. We should not and cannot be in the business of proactively monitoring and flagging content, no matter who the user is — whether a business partner, celebrity or friend.” Twitter can do its best to reinstate that sense of quasi-journalistic commitment to the public. But the fact that the alert even happened suggests that this promise of public commitment, and the expectations we have of Twitter to hold to it, may not be a particularly accurate grasp of the way their public commitment is entangled with their private investment.

Cross posted at Culture Digitally.

Can an algorithm be wrong? Twitter Trends, the specter of censorship, and our faith in the algorithms around us

The interesting question is not whether Twitter is censoring its Trends list. The interesting question is, what do we think the Trends list is, what it represents and how it works, that we can presume to hold it accountable when we think it is “wrong?” What are these algorithms, and what do we want them to be?

(Cross posted from Culture Digitally.)

It’s not the first time it has been asked. Gilad Lotan at SocialFlow (and erstwhile Microsoft UX designer), spurred by questions raised by participants and supporters of the Occupy Wall Street protests, asks the question: is Twitter censoring its Trends list to exclude #occupywallstreet and #occupyboston? While the protest movement gains traction and media coverage, and participants, observers and critics turn to Twitter to discuss it, why are these widely-known hashtags not Trending? Why are they not Trending in the very cities where protests have occurred, including New York?

The presumption, though Gilad carefully debunks it, is that Twitter is, for some reason, either removing #occupywallstreet from Trends, or has designed an algorithm to prefer banal topics like Kim Kardashian’s wedding over important contentious, political debates. Similar charges emerged around the absence of #wikileaks from Twitter’s Trends when the trove of diplomatic cables were released in December of last year, as well as around the #demo2010 student protests in the UK, the controversial execution of #TroyDavis in the state of Georgia, the Gaza #flotilla, even the death of #SteveJobs. Why, when these important points of discussion seem to spike, do they not Trend?

Despite an unshakeable undercurrent of paranoid skepticism, in the analyses and especially in the comment threads that trail off from them, most of those who have looked at the issue are reassured that Twitter is not in fact censoring these topics. Their absence on the Trends listings is a product of the particular dynamics of the algorithm that determines Trends, and the misunderstanding most users have about what exactly the Trends algorithm is designed to identify. I do not disagree with this assessment, and have no particular interest in reopening these questions. Along with Gilad’s thorough analysis, Angus Johnston has a series of posts (1, 2, 3, and 4) debunking the charge of censorship around #wikileaks. Trends has been designed (and re-designed) by Twitter not to simply measure popularity, i.e. the sheer quantity of posts using a certain word or hashtag. Instead, Twitter designed the Trends algorithm to capture topics that are enjoying a surge in popularity, rising distinctly above the normal level of chatter. To do this, their algorithm is designed to take into account not just the number of tweets, but factors such as: is the term accelerating in its use? Has it trended before? Is it being used across several networks of people, as opposed to a single, densely-interconnected cluster of users? Are the tweets different, or are they largely re-tweets of the same post? As Twitter representatives have said, they don’t want simply the most tweeted word (in which case the Trend list might read like a grammar assignment about pronouns and indefinite articles) or the topics that are always popular and seem destined to remain so (apparently this means Justin Bieber).

The charge of censorship is, on the face of it, counterintuitive. Twitter has, over the last few years, enjoyed and agreed with claims that has played a catalytic role in recent political and civil unrest, particularly in the Arab world, wearing its political importance as a red badge of courage (see Shepherd and Busch).  To censor these hot button political topics from Trends would work against their current self-proclaimed purposes and, more importantly, its marketing tactics. And, as Johnston noted, the tweets themselves are available, many highly charged – so why, and for what ends, remove #wikileaks or #occupywallstreet from the Trends list, yet  let the actual discussion of these topics run free?

On the other hand, the vigor and persistence of the charge of censorship is not surprising at all. Advocates of these political efforts want desperately for their topic to gain visibility. Those involved in the discussion likely have an exaggerated sense of how important and widely-discussed it is. And, especially with #wikileaks and #occupywallstreet, the possibility that Twitter may be censoring their efforts would fit their supporters’ ideological worldview: Twitter might be working against Wikileaks just as Amazon, Paypal, and Mastercard were; or in the case of #occupywallstreet, while the Twitter network supports the voice of the people, Twitter the corporation of course must have allegiances firmly intertwined with the fatcats of Wall Street.

But the debate about tools like Twitter Trends is, I believe, a debate we will be having more and more often. As more and more of our online public discourse takes place on a select set of private content platforms and communication networks, and these providers turn to complex algorithms to manage, curate, and organize these massive collections, there is an important tension emerging between what we expect these algorithms to be, and what they in fact are. Not only must we recognize that these algorithms are not neutral, and that they encode political choices, and that they frame information in a particular way. We must also understand what it means that we are coming to rely on these algorithms, that we want them to be neutral, we want them to be reliable, we want them to be the effective ways in which we come to know what is most important.

Twitter Trends is only the most visible of these tools. The search engine itself, whether Google or the search bar on your favorite content site (often the same engine, under the hood), is an algorithm that promises to provide a logical set of results in response to a query, but is in fact the result of an algorithm designed to take a range of criteria into account so as to serve up results that satisfy, not just the user, but the aims of the provider, their vision of relevance or newsworthiness or public import, and the particular demands of their business model. As James Grimmelmann observed, “Search engines pride themselves on being automated, except when they aren’t.” When Amazon, or YouTube, or Facebook, offer to algorithmically and in real time report on what is “most popular” or “liked” or “most viewed” or “best selling” or “most commented” or “highest rated,” it is curating a list whose legitimacy is based on the presumption that it has not been curated. And we want them to feel that way, even to the point that we are unwilling to ask about the choices and implications of the algorithms we use every day.

Peel back the algorithms, and this becomes quite apparent. Yes, a casual visit to Twitter’s home page may present Trends as an unproblematic list of terms, that might appear a simple calculation. But a cursory look at Twitter’s explanation of how Trends works – in its policies and help pages, in its company blog, in tweets, in response to press queries, even in the comment threads of the censorship discussions – Twitter lays bare the variety of weighted factors Trends takes into account, and cops to the occasional and unfortunate consequences of these algorithms. Wikileaks may not have trended when people expected it to because it had before; because the discussion of #wikileaks grew too slowly and consistently over time to have spiked enough to draw the algorithm’s attention; because the bulk of messages were retweets; or because the users tweeting about Wikileaks were already densely interconnected. When Twitter changed their algorithm significantly in May 2010 (though, undoubtedly, it has been tweaked in less noticeable ways before and after), they announced the change in their blog, explained why it was made – and even apologized directly to Justin Bieber, whose position in the Trends list would be diminished by the change. In response to charges of censorship, they have explained why they believe Trends should privilege terms that spike, terms that exceed single clusters of interconnected users, new content over retweets, new terms over already trending ones. Critics gather anecdotal evidence and conduct thorough statistical analysis, using available online tools that track the raw popularity of words in a vastly more exhaustive and catholic way than Twitter does, or at least is willing to make available to its users. The algorithms that define what is “trending” or what is “hot” or what is “most popular” are not simple measures, they are carefully designed to capture something the site providers want to capture, and to weed out the inevitable “mistakes” a simple calculation would make.

At the same time, Twitter most certainly does curate its Trends lists. It engages in traditional censorship: for example, a Twitter engineer acknowledges here that Trends excludes profanity, something that’s obvious from the relatively circuitous path that prurient attempts to push dirty words onto the Trends list must take. Twitter will remove tweets that constitute specific threats of violence, copyright or trademark violations, impersonation of others, revelations of others’ private information, or spam. (Twitter has even been criticized (1, 2) for not removing some terms from Trends, as in this user’s complaint that #reasonstobeatyourgirlfriend was permitted to appear.) Twitter also engages in softer forms of governance, by designing the algorithm so as to privilege some kinds of content and exclude others, and some users and not others. Twitter offers rules, guidelines, and suggestions for proper tweeting, in the hopes of gently moving users towards the kinds of topics that suit their site and away from the kinds of content that, were it to trend, might reflect badly on the site. For some of their rules for proper profile content, tweet content, and hashtag use, the punishment imposed on violators is that their tweets will not factor into search or Trends – thereby culling the Trends lists by culling what content is even in consideration for it. Twitter includes terms in its Trends from promotional partners, terms that were not spiking in popularity otherwise. This list, automatically calculated on the fly, is yet also the result of careful curation to decide what it should represent, what counts as “trend-ness.”

Ironically, terms like #wikileaks and #occupywallstreet are exactly the kinds of terms that, from a reasonable perspective, Twitter should want to show up as Trends. If we take the reasonable position that Twitter is benefiting from its role in the democratic uprisings of recent years, and that it is pitching itself as a vital tool for important political discussion, and that it wants to highlight terms that will support that vision and draw users to topics that strike them as relevant, #occupywallstreet seems to fit the bill. So despite carefully designing their algorithm away from the perennials of Bieber and the weeds of common language, it still cannot always successfully pluck out the vital public discussion it might want. In this, Twitter is in agreement with its critics; perhaps #wikileaks should have trended after the diplomatic cables were released. These algorithms are not perfect; they are still cudgels, where one might want scalpels. The Trends list can often look, in fact, like a study in insignificance. Not only are the interests of a few often precisely irrelevant to the rest of us, but much of what we talk about on Twitter every day is in fact quite everyday, despite their most heroic claims of political import. But, many Twitter users take it to be not just a measure of visibility but a means of visibility – whether or not the appearance of a term or #hashtag increases audience, which is not in fact clear. Trends offers to propel a topic towards greater attention, and offers proof of the attention already being paid. Or seems to.

Of course, Twitter has in its hands the biggest resource by which to improve their tool, a massive and interested user base. One could imagine “crowdsourcing” this problem, asking users to rate the quality of the Trends lists, and assessing these responses over time and a huge number of data points. But they face a dilemma: revealing the workings of their algorithm, even enough to respond to charges of censorship and manipulation, much less to share the task of improving it, risks helping those who would game the system. Everyone from spammers to political activist to 4chan tricksters to narcissists might want to “optimize” their tweets and hashtags so as to show up in the Trends. So the mechanism underneath this tool, that is meant to present a (quasi) democratic assessment of what the public finds important right now, cannot reveals its own “secret sauce.”

Which in some ways leaves us, and Twitter, in an unresolvable quandary. The algorithmic gloss of our aggregate social data practices can always be read/misread as censorship, if the results do not match what someone expects. If #occupywallstreet is not trending, does that mean (a) it is being purposefully censored? (b) it is very popular but consistently so, not a spike? (c) it is actually less popular than one might think? Broad scrapes of huge data, like Twitter Trends, are in some ways meant to show us what we know to be true, and to show us what we are unable to perceive as true because of our limited scope. And we can never really tell which it is showing us, or failing to show us. We remain trapped in an algorithmic regress, and not even Twitter can help, as it can’t risk revealing the criteria it used.

But what is most important here is not the consequences of algorithms, it is our emerging and powerful faith in them. Trends measures “trends,” a phenomena Twitter gets to define and build into its algorithm. But we are invited to treat Trends as a reasonable measure of popularity and importance, a “trend” in our understanding of the term. And we want it to be so. We want Trends to be an impartial arbiter of what’s relevant… and we want our pet topic, the one it seems certain that “everyone” is (or should be) talking about, to be duly noted by this objective measure specifically designed to do so. We want Twitter to be “right” about what is important… and sometimes we kinda want them to be wrong, deliberately wrong – because that will also fit our worldview: that when the facts are misrepresented, it’s because someone did so deliberately, not because facts are in many ways the product of how they’re manufactured.

We don’t have a sufficient vocabulary for assessing the algorithmic intervention a tool like Trends. We’re not good at comprehending the complexity required to make a tool like Trends – that seems to effortlessly identify what’s going on, that isn’t swamped by the mundane or the irrelevant. We don’t have a language for the unexpected associations algorithms make, beyond the intention (or even comprehension) of their designers. We don’t have a clear sense of how to talk about the politics of this algorithm. If Trends, as designed, does leave #occupywallstreet off the list, even when its use is surging and even when some people think it should be there: is that the algorithm correctly assessing what is happening? Is it looking for the wrong things? Has it been turned from its proper ends by interested parties? Too often, maybe in nearly every instance in which we use these platforms, we fail to ask these questions. We equate the “hot” list with our understanding of what is popular, the “trends” list with what matters. Most importantly, we may be unwilling or unable to recognize our growing dependence on these algorithmic tools, as our means of navigating the huge corpuses of data that we must, because we want so badly for these tools to perform a simple, neutral calculus, without blurry edges, without human intervention, without having to be tweaked to get it “right,” without being shaped by the interests of their providers.

Using Off-the-shelf Software for basic Twitter Analysis

Mary Gray, Mike Ananny and I are writing a paper on queer youth and “Glee” for the American Anthropological Association’s annual meeting (yes, I have the greatest job in the world). This is a multi-methodological study by design, because traditional television viewing practices have become so complex. Besides traditional audience ethnography like interviews and participant observation, we are using textual analysis to analyze episode themes, and collected a large corpus of tweets with Glee-related hashtags. This summer, I worked with my high school intern, Jazmin Gonzales-Rivero, to go through this corpus of tweets and pull out useful information for the paper.

We’ve written and published a basic report on using off-the-shelf tools to see patterns and themes in large Twitter data set quickly and easily.

Abstract:

With the increasing popularity of large social software applications like Facebook and Twitter, social scientists and computer scientists have begun developing innovative approaches to dealing with the vast amounts of data produced and collected in such environments. For qualitative researchers, the methods involved can be daunting and unfamiliar. In this report, we outline some basic procedures for working with a large-scale Twitter data set to answer qualitative inquiries. We use Python, MySQL, and the word-cloud generator Wordle to identify patterns in re-tweets, tweet authors, dates and times of tweets, frequency of hashtags, and frequency of word use. Such data can provide valuable augmentation to qualitative inquiry. This paper is aimed at social scientists and humanities scholars with limited experience with big data and a lack of computing resources to do extensive quantitative research.

Citation:
Marwick, A. and Gonzales-Rivero, J. (2011). Learning to Work with Large-Scale Twitter Data Sets: Using Off-The-Shelf Tools to Quickly and Easily See Tweet Patterns. Microsoft Research Social Media Collective Report, MSR-SMC-11-01, Cambridge, MA. [Download as PDF]

If you’re a seasoned computer scientist or a Big Data aficionado, the information in this paper will seem quite simplistic. But for those of us without programming backgrounds who study Twitter or other forms of social media, the idea of tackling a set of 450,000 tweets can seem quite daunting. In this paper, Jazmin and I walk step-by-step through the methods she used to parse a set of Tweets, using free and easily accessible tools like MySQL, Python, and Wordle. We hope this will be helpful for other legal, humanities, and social science scholars who might want to dip their foot into Big Data to augment more qualitative research findings.

Citation:

How much is a life worth in pixels?

Analysis of yesterday’s news coverage of the Mexican massacre

Mexican Tweets

More than fifty people were murdered yesterday in what is now the most violent episode in the ongoing Mexican Drug War. Most of the victims were women, some were pregnant. After learning about the horrific massacre in Monterrey, I spent several hours reading the reports coming from México via social and mainstream media. I exchanged messages with friends and family who live there (I went to college in Monterrey and my parents live no too far from there). The Twitter trending topics in México showed anger, desperation and hopelessness. One of the hashtags people often use to report violence in the city, #mtyfollow, was full of messages of repudiation and of people trying to help others find their loved ones. Some of the most retweeted messages were those with the names of the possible victims, as you can see in this chart.

 Twitter activity on a popular keyword right after the massacre
Mexican Twitter users helping find missing people after the massacre

American Silence

The massacre  happened only  140 miles south of Texas in one of the largest metropolitan areas in North America. Yet, as Nancy Baym put it,  the American twittersphere was mum. Why? In part, I think, because most of the news websites in the US were ignoring the event.

One could understand the lack of coverage in the first few hours. The news coming out of México were talking about “only” four deaths, so it is possible the events might not have caught the attention of the American news websites at first. However, ten hours after the attack the official number was already above fifty victims, with some reports as high as 61, yet sites like CNN.com gave little attention to the story. The link to the article of the massacre was buried among articles such as one about actress Rose McGowan’s childhood.

I know CNN is not known for its high-quality news coverage so I decided to check out one of America’s most trusted news outlets:  the New York Times.  I was disappointed, again.  I had to scroll all the way down to the “More News” section to find a 10 pixel-font link to the article titled “Arson Kills 40 in a Casino in Mexico.”

Pixels per Victim

Frustrated by this, I decided to get a more objective assessment of the coverage by counting the number of pixels different news websites were assigning to the story of the massacre. I know web designers put a lot of work into every single pixel on the screen, especially of high-traffic websites. Visitor’s attention is scarce and every pixel counts. So I took screenshots of  the front pages of some of the major news websites and calculated the amount of screen real state assigned to the story of the massacre. For example, the the New York Times, gave the story 291×11 pixels, a mere 0.27% of the screen real state (in a window size of 1439 x812 pixels). CNN gave it even less at 191×10 pixels, representing 0.16% of the screen. But what about other websites? Did any other websites in the English-world gave it more space? Yes. Read on.

I decided to look into non-American websites. If my calculations are correct, it turns out that Al-Jazeera and The Guardian alone gave more pixels to the story than CNN, the Washington Post, FOX News, the Wall Street Journal, the New York Times, MSNBC and the Houston Chronicle combined. Americans might be better off getting news about  their southern neighbor from a British or a Qatari website than from many of the US ones. The two exceptions were the LA Times and the Huffington Post. They both gave more pixels to the story than any other news source I analyzed. CNN was at the bottom of the list though. Click here for a slideshow of the websites I analyzed.

To summarize my results, I generated a ranking of the number of pixels per victim each news website devoted to the massacre. Yes, this issue is much more nuanced than pixels per victim, and I am not a journalism expert but I hope it can help start a discussion (or continue an existing one). If my calculations are correct, CNN devoted 38 pixels per victim, 76 times less than the LA Times which gave 2,920 pixels per victim.

Closing Thoughts

The Mexican Drug War is a complex geopolitical conflict closely linked to the United States’ financial stability  and national security. If American news websites do not give enough attention to the massacre of 50 people, what can we expect of less dramatic stories with perhaps more structural and long-term implications? I list here some of the recent related stories that I wish had gotten much more attention and that I hope you get to read to understand the complexity of the problem:

  1. The Guardian’s article on “How a big US bank laundered billions from Mexico’s murderous drug gangs.”
  2. The LA Times’ article on a senate report on how the “U.S. can’t justify its drug war spending” (there are many more articles about this).
  3. The NY Times story on how US-officials “allowed nearly 1,000 guns to flow illegally into Mexico” (also check this campaign to stop gun smuggling).
  4. Chomsky’s excellent synthesis of the whole Drug War problem  with a historical perspective that only Chomsky can give.


If you liked this, follow me on Twitter or identi.ca.