The accountability of social media platforms, in the age of Trump

Pundits and commentators are just starting to pick through the rubble of this election and piece together what happens and what it means. In such cases, it is often easier to grab hold of one explanation — Twitter! racism! Brexit! James Comey! — and use it as a clothesline to hang the election on and shake it into some semblance of sense. But as scholars, we do a disservice to allow for simple or single explanations. “Perfect storm” has become a cliche, but I can see a set of elements that had to all be true, that came together, to produce the election we just witnessed: Globalization, economic precarity, and fundamentalist reactionary responses; the rise of the conservative right and its target tactics, especially against the Clintons; backlashes to multiculturalism, diversity, and the election of President Obama; the undoing of the workings and cultural authority of journalism; the alt-right and the undercurrents of social media; the residual fear and anxiety in America after 9/11. It is all of these things, and they were all already connected, before candidate Trump emerged.

Yet at the same time, my expertise does not stretch across all of these areas. I have to admit that I have trained myself right down to a fine point: social media, public discourse, technology, control, law. I have that hammer, and can only hit those nails. If I find myself being particular concerned about social media and harassment, or want to draw links between Trump’s dog whistle politics, Steve Bannon and Breitbart, the tactics of the alt-right, and the failings of Twitter to consider the space of discourse it has made possible, I risk making it seem like I think there’s one explanation, that technology produces social problems. I do not mean this. In the end, I have to have faith that, as I try to step up and say something useful about this one aspect, some other scholar is similarly stepping up an saying something about fundamentalist reactions to globalization, and someone else is stepping up to speak about the divisiveness of the conservative movement.

The book I’m working on now, nearing completion, is about social media platforms and the way they have (and have not) stepped into the role of arbiters of public discourse. The focus is on the platforms, their ambivalent combination of neutrality and intervention, the actual ways in which they go about policing offensive content and behavior, and the implications those tactics and arrangements have for how we think about the private curation of public discourse. But the book is framed in terms of the rise and now, for lack of a better word, adolescence of social media platforms, and how the initial optimism and enthusiasm that fueled the rise of the web, overshadowed the darker aspects already emergent there, and spurred the rise of the first social media platforms, seems to have given way to a set of concerns about how social media platforms work and how they are used — sometimes against people, and towards very different ends than were originally imagined. Those platforms did not at first imagine, and have not thoroughly thought through, how they now support (among many other things) a targeted project of racial animosity and a cold gamesmanship about public engagement. In the context of the election, my new goal is to boost that part of the argument, to highlight the opportunities that social media platforms offer to forms of public discourse that are not only harassing, racist, or criminal, but also that can take advantage of the dynamics of social media to create affirming circles of misinformation, to sip the poison of partisanship, to spur leaderless movements ripe for demagoguery — and how the social media platforms who now host this discourse have embraced a woefully insufficient sense of accountability, and must rethink how they have become mechanisms of social and political discourse, good and ill.

This specific project is too late in the game for a radical shift. But as I think beyond it, I feel an imperative to be sure that my choices of research topics are driven more by cultural and political imperative than merely my own curiosity. Or, ideally, the perfect meeting point of the two. It seems like the logical outcome of my interest in platforms and content moderation is to shift how we think of platforms, not as mere intermediaries between speakers (if they ever were, they are no longer) to understand them as constitutive of public discourse. If we understand them as constituting discourse — both by the choreography they install in their design, the moderation they conduct as a form of policy, and in the algorithmic selection of which raw material becomes “my feed,” then we expand their sense of responsibility. moreover, we might ask what it would mean to hold them accountable for making the political arena we want, we need. These questions will only grow in importance and complexity as these information systems depend more on more on algorithmic, machine learning, and other automated techniques;, more regularly include bots who are difficult to discern from the human participants; and that continue to extend their global reach for new consumers, also extending and entangling with the very shifts of globalization and tribalization we will continue to grapple with.

These comments were part of a longer post at Culture Digitally that I helped organize, in which a dozen scholars of media and information reflected on the election and the future directions of their own work, and our field, in light of the political realities we woke up to Wednesday morning. My specific scholarly community cannot address every issue that’s likely on the horizon, but our work does touch a surprising number of them. The kinds of questions that motivate our scholarship — from fairness and equity, to labor and precarity, to harassment and misogyny, to globalism and fear, to systems and control, to journalism and ignorance — all of these seem so much more pressing today then they even did yesterday.

Architecture or Minecraft?

(or, “Hopefully 4chan Won’t Hear About This Contest”)

The social-media-ification of everything continues. If you’ve got time for some late-summer procrastination, thanks to the Internet you can choose the design of my house.

As you may have read here two weeks ago, I’m crowdsourcing it. The first competition is over and I received 16 entries — above average for arcbazar.com. That means anyone on the Internet can now help pick a winner. I’d say there are some great designs and many awful ones.

My needs attracted designers from Nigeria, Bulgaria, Ukraine, Romania, Vietnam, Mexico, and Indonesia. But also London, Texas, and my very own town of Ann Arbor, Michigan. Submissions are anonymous, but Arcbazar maps their self-reported locations:
arcbazar map.png

Anyone can submit–no credentials required. So far I don’t think it’s “the worst thing to happen to architecture since the Internet started” but there’s still plenty of time for this to go sideways on me. The next step is voting.

In Ayn Rand’s The Fountainhead, the young architect Howard Roark says, “I don’t intend to build in order to have clients. I intend to have clients in order to build.” Like Rand’s protagonist, I think some of my designers refused to compromise their unique vision. To give you the flavor, here are some comments my friends made about the low points:

“This house looks like the head of a Minecraft pig”:

BARNSK

 

For reference:

barnpig

 

We asked for a barn-like building with a gambrel roof. That was a requirement. To write this requirement, “gambrel” is a word I had to look up. Google says:

gambrel roof

I think some of the designers really struggled with it! A friend said: “It looks like this building fell down and broke its spine.”

broken roof

 

“This appears to be a car dealership.”:

arcbazar concept

 

You can help choose the winner here: (You need to sign up for a free login.)

There are two separate things to do at this link — voting and commenting. Anyone with an arcbazar login can vote: it’s a numerical rating in five categories.

To vote click “Vote now!” when you are looking at a particular entry. This affects the rankings.

arcbazar vote now link

 

To comment and to read other people’s comments, click the word “Evaluations” when you are looking at a particular entry. You need a Facebook login to add a comment.

arcbazar evaluations link

 

Stay tuned. More updates here as the process unfolds.

 

I crowdsourced the design of my house

(or, “The Social-Media-ification of Everything”)

The architecture crowdsourcing Web site Arcbazar has been called “The Worst Thing to Happen To Architecture Since the Internet Started.” The site also got some press recently by running a parallel, unauthorized architecture competition for the “people’s choice” for the design of the Obama Presidential Library.

arcbazar screen shot home page
The arcbazar welcome page. (click to enlarge)

I’ve decided to use arcbazar.com to run two architectural competitions for my house. My competitions started yesterday (links below), in case you want to see this play out in real time.

Most of the attention given to arcbazar has been about labor, safety, and value. Discussion has centered around possible changes to the profession of architecture. Does it lower standards? Will it put architecture jobs and credentials in jeopardy?

Yet as a social media researcher the part of arcbazar that has my attention is what I would call the “social media-ification of everything.”

Anyone with a free arcbazar account can submit a design or act as a juror for submitted designs, and as the Web site has evolved it has added features that evoke popular social media platforms. Non-architects are asked to vote on designs, and the competitions use familiar social media features and metaphors like a competition “wall.”

Here are my competitions. You need a free account to look at them.

This means YOU could design my house, so please choose wisely. (One friend said: “You realize your house is going to be renamed Housey McHouseFace.”) Keep your fingers crossed for me that this works out well. Some of the submitted designs for past competitions are a little… odd…

obama building shaped like obamas name
Who wouldn’t want a house in the shape of their own name? (click to enlarge)

CFP: Studying Social Media and Digital Infrastructures: a workshop-within-a-conference

 

part of the 50th Hawaii International Conference on System Sciences (HICSS-50)

paper submission deadline: June 15, 2016, 11:59pm HST.

  

For fifty years, the Hawaii International Conference on System Sciences (HICSS) has been a home for researchers in the information, computer, and system sciences (http://www.hicss.org/). The 50th anniversary event will be held January 4-7, 2017, at the Hilton Waikoloa Village. With an eye to the exponential growth of digitalization and information networks in all aspects of human activity, HICSS has continued to expand its track on Digital and Social Media (http://www.hicss.org/#!track3/c1xcj).

This year, among the Digital and Social Media track’s numerous offerings, we offer two minitracks meant to work in concert. Designed to sequence together into a single day-long workshop-within-a-conference, they will host the best emerging scholarship from sociology, anthropology, communication, information studies, and science & technology studies that addresses the most pressing concerns around digital and social media. In addition, we have developed a pre-conference workshop on digital research methods that will inform and complement the work presented in these minitracks.

 

Minitrack 1: Critical and Ethical Studies of Digital and Social Media

http://www.hicss.org/#!critical-ethical-studies-of-dsm/c24u6

Organizers: Tarleton Gillespie, Mary Gray, and Robert Mason

The minitrack will critically interrogate the role of DSM in supporting existing power structures or realigning power for underrepresented or social marginalized groups, and raise awareness or illustrate the ethical issues associated with doing research on DSM. Conceptual papers would address foundational theories of critical studies of media or ethical conduct in periods of rapid sociotechnical change—e.g., new ways of thinking about information exchange in communities and societies. Empirical papers would draw on studies of social media data that illustrate the critical or ethical dimensions of the use of such data. We welcome papers considering topics such as (but not limited to):

*   the power and responsibility of digital platforms

*   bias and discrimination in the collection and use of social data

*   political economies and labor conditions of paid and unpaid information work

*   values embedded in search engines and social media algorithms

*   changes in societal institutions driven by social media and data-intensive techniques

*   alternative forms of digital and social media

*   the ethical dynamics of studying human subjects through their online data

*   challenges in studying the flow of information and misinformation

*   barriers to and professional obligations around accessing and studying proprietary DSM data

 

Minitrack 2: Values, Power, and Politics in Digital Infrastructures

http://www.hicss.org/#!values-power-and-politics-in-digital-i/c19uj

Organizers: Katie Shilton, Jaime Snyder, and Matthew Bietz

This minitrack will explore the themes of values, power, and politics in relation to the infrastructures that support digital data, documents, and interactions. By considering how infrastructures – the underlying material properties, policy decisions, and mechanisms of interoperability that support digital platforms – are designed, maintained, and dismantled, the work presented in this mini-track will contribute to debates about sociotechnical aspects of digital and social media, with a focus on data, knowledge production, and information access. This session will focus on research that employs techniques such as infrastructural inversion, trace ethnography or design research (among other methods) to explore factors that influence the development of infrastructures and their use in practice. We welcome papers considering topics such as (but not limited to):

*  politics and ethics in digital platforms and infrastructures

*  values of stakeholders in digital infrastructures

*  materiality of values, power, or politics in digital infrastructures

*  tensions between commercial infrastructures and the needs of communities of practice

*  maintenance, repair, deletion, decay of digital and social media infrastructures

*  resistance, adoption and adaptation of digital infrastructures

*  alternative perspectives on what comprises infrastructures

 

Pre-conference workshop: Digital Methods “Best Practices”

http://shawnw.io/workshops/HICSS-digitalmethods

Organizers: Shawn Walker, Mary Gray, and Robert Mason

While the study of digital and social media and its impact on society has exploded, discussion of the best methods for doing so remains thin. Academic researchers and practitioners have deployed traditional techniques, from ethnography to social network analysis; but digital and social media challenge and even defy these techniques in a number of ways that must be examined. At the same time, digital and social media may benefit from more organic and unorthodox methods that get at aspects that cannot be examined otherwise. This intensive half day workshop will focus on approaches and best practices for studying digital and social media. We aim to go beyond the application of existing methods into online environments and collect innovative methods that break new ground while producing rigorous insights. This workshop will draw on invited and other participants’ research, teaching, classroom, and business experiences to think through “mixed methods” for qualitative and quantitative studies of digital and social media systems.

Through a series of roundtables and guided discussions, the workshop will focus on best practices for studying digital and social media. As part of these discussions, we also will highlight technical and ethical challenges that arise from our studying cross-platform, digital and social media phenomenon. The output of this workshop will be an open, “co-authored” syllabus for a seminar offering what we might call a mixed-method, “from causal to complicated” approach to digital and social media research, applicable to both researchers and practitioners alike.

 

How to apply

April 1, 2016: Paper submission opens.

June 15, 2016: Paper submission ends, 11:59pm HST.

Submission to one of the the mini-tracks requires a complete paper. Instructions for submission requirements are available here: http://www.hicss.org/#!author-instructions/c1dsb

Though the two minitracks are designed to work together, for submitting a paper you must choose one to apply to. Feel free to contact the mini-track organizers if you have questions about which is a better fit for your work. For the pre-conference workshop, application instructions, updates, materials, and a group syllabus will be posted on the workshop website.

Reflections on technology and the 2016 elections

PastedGraphic-2Way back in 2008, Off the Bus reporter Mayhill Fowler filed a report on an appearance by Hillary Clinton during that spring’s Democratic primary. The piece opens with a quote: “‘Being here this morning is a gift,’ Hillary Clinton says to the small band of supporters, several hundred strong, gathered under the Saturday morning sun at Good Will Fire Company No. 2, Station 52 in West Chester, Pennsylvania.” Fowler continues,

The Senator is late for her first event of the day; her voice is hoarse. But like the day she is bright and calm. Gone are the faux smiles and waves, the slight brittleness, that have been part of her stage entrance so many times on the campaign trail. Being here this morning is a gift are the first words out of her mouth. It’s clear she means it. This is the perception of an older woman, one who has watched friends and family pass on, who has wondered why they and not she, who has had to settle for answers not on the great philosophies but on the simple things. A new morning as gift–there isn’t a wise woman in the world who doesn’t share Hillary Clinton’s feeling. But that a presidential candidate would choose such an opening remark four days out from a primary that looks to be ‘the one’ is extraordinary. For the remark and its tenor show that Hillary Clinton has been digging deep within herself, asking herself some hard questions.  But it’s too late.

What I love about Fowler’s essay, why I remember it all these years later, is the beautiful rendering of the lived details of the moment, in a way that captures something essential about politics. Fowler leaned towards Obama, but in noting the depth of “Being here this morning is a gift,” she writes as someone who identifies with Hillary – “there isn’t a wise woman in the world who doesn’t share Hillary Clinton’s feeling” –  and uses her identification, her shared experience, to express something key about the moment, about Hillary, about politics. The essay goes on to discuss Hillary’s unique ability to communicate policy issues in plain language to connect with the voters of Pennsylvania, as well as the awkward attempts at mudslinging against her opponent. Read in its entirety, the piece helps me understand why Hillary Clinton won Nevada three weeks ago, and why she lost Michigan last week.

What Fowler accomplished in her essay was not simple or easy. But I think without careful explorations of feelings like hers, we will never understand the role of technology in politics.

I was reminded of Fowler’s piece after reading Clay Shirky’s bravura commentary on the 2016 primaries, in the form of fifty tweets about how “social media has broken the ability of elites to determine the boundaries of acceptable conversation.” As smart as it was, there seemed to me something lacking in Shirky’s treatise. At first I thought of some of the details missed: in the long view, Trump’s rise owes less to social media than to the old tech of talk radio; and the aftereffects of the 2008 economic collapse, the bank bailouts and so on, are at least as important as anything technological in this election cycle. But it’s not just about piling more factors into the analysis. There’s a problem with how Shirky imagines technology.

Shirky tells a story in which the steady march of new technologies (cable, the web, social media) exploited by renegade candidates (Ross Perot, Howard Dean, Barack Obama) gradually undermined the American political parties’ capacity to set the boundaries of debate as they had in the days of network television, to the point that now, with the rise of Trump and Sanders, the parties have lost their coherence. It’s wonderfully terse: “Perot adopted non-centrist media, Dean distributed fundraising, Obama non-party voter mobilization.” That’s twenty years of political drama, a compelling, gripping narrative arc, in eleven words.

And of course technology does matter. It has been obvious ever since the Dean campaign that the simple math of internet powered small donations offered a significant alternative to traditional high dollar bundling, and to the political commitments that such bundling entailed. Bernie’s small-donor powered fundraising would have been inconceivable on a national scale before the spread of the internet. The mainstream media’s historical power to exclude non-centrist candidates with self-fulfilling prophecies about electability had repeatedly presented an insurmountable barrier to the likes of Jesse Jackson and Ron Paul, and to Howard Dean in the last month of his campaign. The fact that, in this cycle, both Trump and Sanders have been able push through that barrier cannot be explained without reference to voters’ capacities to find alternate narratives on social media platforms.

But that’s not the whole story. People like to say that nobody predicted this election, but back in 2013, Nicco Mele, who had cut his political teeth as webmaster for Howard Dean’s campaign, said, “The primary lesson of the last four cycles, maybe five cycles, is that the advantages of the establishment are greatly diminished, perhaps completely obliterated . . . The notion that [Hillary] can coast to front runner status on her history and contributions to the Democratic Party certainly didn’t hold true in 2008, so I don’t see any reason it’ll hold true in 2016. . . . In 2008, 6 million people gave $100 to make Barack Obama president. And those 6 million people made him a household name and no one’s going to do that to Hillary Clinton. And I think that kind of power, people feel it online and respond to it. … Online, smaller donors like to think they created you, that they made you, and no one will be able to feel that way about Hillary.”

If you squint you might think Mele’s prescient observations closely tracked Shirky’s: they both say digital technologies have eroded centralized power bases. But there’s a key difference: Mele knows that giving online is also about a feeling, about the sense that you are actively making your candidate a household name, that “online smaller donors like to think they created you.” Knowledge of a widely shared, specific feeling can only be found through experience. If you only look at bullet-pointed timelines of gadgets and campaigns, you can’t see it. In the 2014 mid-term elections, my email inbox exploded with entreaties from Democratic candidates across the country, begging for money in order to keep the Republicans, with their poisonous policies, from controlling the Senate. Liberals like myself all knew what the Democrats were against, we all agreed with them, but there was no sense of actively being part of the creation of something new. And on balance, the Democrats lost that round. They had not learned the lessons of Dean in ’04 and Obama in ’08, not because they didn’t understand the technology, but because they didn’t understand the feeling.

I don’t want to just sing the praises of intellectual caution, of a rich sense of history, or of narrative journalism. (Jill Lepore recently penned an essay that offers all three, yet her analysis is closer to Shirky’s than than Fowler’s or Mele’s.) And I don’t want to get mystical about complexity or the irreducibility of experience. C. Wright Mills was correct that “a mere enumeration of a plurality of causes is . . .  a paste-pot eclecticism which avoids the real task of social analysis” (Power Elite, 243).

The point is that some grasp of shared, lived experience is necessary (if not sufficient) to any useful judgment about the effects of technology. Too often we imagine technology as a kind of shortcut, an easily identifiable “thing” that solves our dilemmas, political and intellectual. Faced with uncertainty and conflict, we point to some tech, in the hopes that it will dispel the fog of our confusion and assuage our anxieties about our future. It’s not that technology doesn’t matter, but that “technology” or “the internet” or “social media” are not things, they are tangles of accumulated practices and experiences that cannot be understood outside of social context; speaking about them as if they are explanatory, as if they might be the solution, just causes more confusion. (There’s a well developed literature that insists on decentering technology, on breaking it down into specific sets of habits, embodied practices, variegated social relations.[i]) As Secretary of State, Hillary Clinton for a time bought into the notion that Twitter could provide a fix to the problems of the Middle East.[ii] The tendency to point to technology as the explanation for our confusions, as the solution to our political dilemmas, is a widespread habit, and for the moment, it’s a habit with problems. It’s too clever by half (like so much of Clintonian thinking: don’t ask don’t tell, financial deregulation). It didn’t work for the Democrats in 2014, it won’t work for us now.

One of the “aha” moments for me about Bernie’s campaign, a moment when I first thought it might really go somewhere, was last July when I heard that Zack Exley signed on as an adviser to Sanders. I knew of Zack because he’d learned some tricks about online fundraising and organizing while working for moveon.org, and brought them to Dean’s campaign. (We can probably blame Zack for the ubiquitous “ask” that structures fundraising emails.) After the Dean campaign ended in the spring of 2004, many young staffers took their experiences with internet-powered grass roots campaigning into various professional lives, working for other campaigns, creating consulting firms, and the like. Zack did some of that, but what impressed me most is he also spent some time touring the U.S. in a pickup truck, visiting evangelical communities around the country. His blog of the trip, full of stories of face-to-face encounters that turned into astute observations, narrated his growing belief that the American left needed to engage in dialog with the evangelical community, that there were communities out there who might have common cause with the left on economic and environmental issues, commonalities that the left was ignoring. Zack gets his ideas about what’s politically possible, not just from polls or the received wisdoms of the punditocracy or various academic salons, but from sympathetic engagements with ordinary folks from all walks of life. Zack can lay claim to being as much of a technology expert as anyone, but that would mean nothing if he was not also someone who loves small “d” democracy, who is open to the unexpected, someone who thinks hard about others’ feelings for change, others’ passions to have a say in making the future. That’s why, as I write this, he’s out there making history.

 


 

[i] Ever since Raymond Williams (The Long Revolution, First Edition edition (S.l.: Chatto & Windus, 1961)). introduced the concept of “structures of feeling,” diverse scholarly literatures have probed the relations of subjectivity, feelings, and affect social structure. Approaches range from Frankfurt school critical theory (Illouz, 2007) to affect theory (e.g., Papacharissi, 2014; Ticineto Clough & Halley, 2007) to questions of authenticity and its ironies (Banet-Weiser, 2012). Eva Illouz, Cold Intimacies: The Making of Emotional Capitalism (Polity, 2007). Zizi Papacharissi, Affective Publics: Sentiment, Technology, and Politics, 1 edition (Oxford ; New York, NY: Oxford University Press, 2014). Patricia Ticineto Clough and Jean Halley, eds., The Affective Turn: Theorizing the Social (Duke University Press, 2007). Sarah Banet-Weiser, Authentic TM the Politics and Ambivalence in a Brand Culture (New York, NY: New York University Press, 2012).

[ii] Kentaro Toyama, Geek Heresy: Rescuing Social Change from the Cult of Technology (PublicAffairs, 2015), 35.


 

This was crossposted on Culture Digitally.

#trendingistrending: when algorithms become culture

trendingistrending_frontpage_Page_01I wanted to share a new essay, “#Trendingistrending: When Algorithms Become Culture” that I’ve just completed for a forthcoming Routledge anthology called Algorithmic Cultures: Essays on Meaning, Performance and New Technologies, edited by Robert Seyfert and Jonathan Roberge. My aim is to focus on the various “trending algorithms” that populate social media platforms, consider what they do as a set, and then connect them to a broader history of metrics used in popular media, to both assess audience tastes and portray them back to that audience, as a cultural claim in its own right and as a form of advertising.

The essay is meant to extend the idea of “calculated publics” I first discussed here and the concerns that animated  this paper. But more broadly I hope it pushes us to think about algorithms not as external forces on the flow of popular culture, but increasingly as elements of popular culture themselves, something we discuss as culturally relevant, something we turn to face so as to participate in culture in particular ways. It also has a bit more to say about how we tend to think about and talk about “algorithms” in this scholarly discussion, something I have more to say about here.

I hope it’s interesting, and I really welcome your feedback. I already see places where I’ve not done the issue justice: I should connect the argument more to discussions of financial metrics, like credit ratings, as another moment when institutions have reason to turn such measures back as meaningful claims. I found the excellent essay (journal; academia.edu), where Jeremy Morris writes about what he calls “infomediaries,” late in my process, so while I do gesture to it, it could have informed my thinking even more. There are a dozen other things I wanted to say, and the essay is already a little overstuffed.

I do have some opportunity to make specific changes before it goes to press, so I’d love to hear any suggestions, if you’re inclined to read it.

The Facebook “It’s Not Our Fault” Study

Today in Science, members of the Facebook data science team released a provocative study about adult Facebook users in the US “who volunteer their ideological affiliation in their profile.” The study “quantified the extent to which individuals encounter comparatively more or less diverse” hard news “while interacting via Facebook’s algorithmically ranked News Feed.”*

  • The research found that the user’s click rate on hard news is affected by the positioning of the content on the page by the filtering algorithm. The same link placed at the top of the feed is about 10-15% more likely to get a click than a link at position #40 (figure S5).
  • The Facebook news feed curation algorithm, “based on many factors,” removes hard news from diverse sources that you are less likely to agree with but it does not remove the hard news that you are likely to agree with (S7). They call news from a source you are less likely to agree with “cross-cutting.”*
  • The study then found that the algorithm filters out 1 in 20 cross-cutting hard news stories that a self-identified conservative sees (or 5%) and 1 in 13 cross-cutting hard news stories that a self-identified liberal sees (8%).
  • Finally, the research then showed that “individuals’ choices about what to consume” further limits their “exposure to cross-cutting content.” Conservatives will click on only 17% a little less than 30% of cross-cutting hard news, while liberals will click 7% a little more than 20% (figure 3).

My interpretation in three sentences:

  1. We would expect that people who are given the choice of what news they want to read will select sources they tend to agree with–more choice leads to more selectivity and polarization in news sources.
  2. Increasing political polarization is normatively a bad thing.
  3. Selectivity and polarization are happening on Facebook, and the news feed curation algorithm acts to modestly accelerate selectivity and polarization.

I think this should not be hugely surprising. For example, what else would a good filter algorithm be doing other than filtering for what it thinks you will like?

But what’s really provocative about this research is the unusual framing. This may go down in history as the “it’s not our fault” study.

Facebook: It’s not our fault.

I carefully wrote the above based on my interpretation of the results. Now that I’ve got that off my chest, let me tell you about how the Facebook data science team interprets these results. To start, my assumption was that news polarization is bad.  But the end of the Facebook study says:

“we do not pass judgment on the normative value of cross-cutting exposure”

This is strange, because there is a wide consensus that exposure to diverse news sources is foundational to democracy. Scholarly research about social media has–almost universally–expressed concern about the dangers of increasing selectivity and polarization. But it may be that you do not want to say that polarization is bad when you have just found that your own product increases it. (Modestly.)

And the sources cited just after this quote sure do say that exposure to diverse news sources is important. But the Facebook authors write:

“though normative scholars often argue that exposure to a diverse ‘marketplace of ideas’ is key to a healthy democracy (25), a number of studies find that exposure to cross-cutting viewpoints is associated with lower levels of political participation (22, 26, 27).”

So the authors present reduced exposure to diverse news as a “could be good, could be bad” but that’s just not fair. It’s just “bad.” There is no gang of political scientists arguing against exposure to diverse news sources.**

The Facebook study says it is important because:

“our work suggests that individuals are exposed to more cross-cutting discourse in social media they would be under the digital reality envisioned by some

Why so defensive? If you look at what is cited here, this quote is saying that this study showed that Facebook is better than a speculative dystopian future.*** Yet the people referred to by this word “some” didn’t provide any sort of point estimates that were meant to allow specific comparisons. On the subject of comparisons, the study goes on to say that:

“we conclusively establish that…individual choices more than algorithms limit exposure to attitude-challenging content.”

compared to algorithmic ranking, individuals’ choices about what to consume had a stronger effect”

Alarm bells are ringing for me. The tobacco industry might once have funded a study that says that smoking is less dangerous than coal mining, but here we have a study about coal miners smoking. Probably while they are in the coal mine. What I mean to say is that there is no scenario in which “user choices” vs. “the algorithm” can be traded off, because they happen together (Fig. 3 [top]). Users select from what the algorithm already filtered for them. It is a sequence.**** I think the proper statement about these two things is that they’re both bad — they both increase polarization and selectivity. As I said above, the algorithm appears to modestly increase the selectivity of users.

The only reason I can think of that the study is framed this way is as a kind of alibi. Facebook is saying: It’s not our fault! You do it too!

Are we the 4%?

In my summary at the top of this post, I wrote that the study was about people “who volunteer their ideological affiliation in their profile.” But the study also describes itself by saying:

“we utilize a large, comprehensive dataset from Facebook.”

“we examined how 10.1 million U.S. Facebook users interact”

These statements may be factually correct but I found them to be misleading. At first, I read this quickly and I took this to mean that out of the at least 200 million Americans who have used Facebook, the researchers selected a “large” sample that was representative of Facebook users, although this would not be representative of the US population. The “limitations” section discusses the demographics of “Facebook’s users,” as would be the normal thing to do if they were sampled. There is no information about the selection procedure in the article itself.

Instead, after reading down in the appendices, I realized that “comprehensive” refers to the survey research concept: “complete,” meaning that this was a non-probability, non-representative sample that included everyone on the Facebook platform. But out of hundreds of millions, we ended up with a study of 10.1m because users were excluded unless they met these four criteria:

  1. “18 or older”
  2. “log in at least 4/7 days per week”
  3. “have interacted with at least one link shared on Facebook that we classified as hard news”
  4. “self-report their ideological affiliation” in a way that was “interpretable”

That #4 is very significant. Who reports their ideological affiliation on their profile?

add your political views

It turns out that only 9% of Facebook users do that. Of those that report an affiliation, only 46% reported an affiliation in a way that was “interpretable.” That means this is a study about the 4% of Facebook users unusual enough to want to tell people their political affiliation on the profile page. That is a rare behavior.

More important than the frequency, though, is the fact that this selection procedure confounds the findings. We would expect that a small minority who publicly identifies an interpretable political orientation to be very likely to behave quite differently than the average person with respect to consuming ideological political news.  The research claims just don’t stand up against the selection procedure.

But the study is at pains to argue that (italics mine):

“we conclusively establish that on average in the context of Facebook, individual choices more than algorithms limit exposure to attitude-challenging content.”

The italicized portion is incorrect because the appendices explain that this is actually a study of a specific, unusual group of Facebook users. The study is designed in such a way that the selection for inclusion in the study is related to the results. (“Conclusively” therefore also feels out of place.)

Algorithmium: A Natural Element?

Last year there was a tremendous controversy about Facebook’s manipulation of the news feed for research. In the fracas it was revealed by one of the controversial study’s co-authors that based on the feedback received after the event, many people didn’t realize that the Facebook news feed was filtered at all. We also recently presented research with similar findings.

I mention this because when the study states it is about selection of content, who does the selection is important. There is no sense in this study that a user who chooses something is fundamentally different from the algorithm hiding something from them. While in fact the the filtering algorithm is driven by user choices (among other things), users don’t understand the relationship that their choices have to the outcome.

not sure if i hate facebook or everyone i know
In other words, the article’s strange comparison between “individual’s choices” and “the algorithm,” should be read as “things I choose to do” vs. the effect of “a process Facebook has designed without my knowledge or understanding.” Again, they can’t be compared in the way the article proposes because they aren’t equivalent.

I struggled with the framing of the article because the research talks about “the algorithm” as though it were an element of nature, or a naturally occurring process like convection or mitosis. There is also no sense that it changes over time or that it could be changed intentionally to support a different scenario.*****

Facebook is a private corporation with a terrible public relations problem. It is periodically rated one of the least popular companies in existence. It is currently facing serious government investigations into illegal practices in many countries, some of which stem from the manipulation of its news feed algorithm. In this context, I have to say that it doesn’t seem wise for these Facebook researchers to have spun these data so hard in this direction, which I would summarize as: the algorithm is less selective and less polarizing. Particularly when the research finding in their own study is actually that the Facebook algorithm is modestly more selective and more polarizing than living your life without it.

Update: (6pm Eastern)

Wow, if you think I was critical have a look at these. It turns out I am the moderate one.

Eszter Hargittai from Northwestern posted on Crooked Timber that we should “stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously.” And (my favorite): “I thought Science was a serious peer-reviewed publication.”

Nathan Jurgenson from Maryland and Snapchat wrote on Cyborgology (“in a fury“) that Facebook is intentionally “evading” its own role in the production of the news feed. “Facebook cannot take its own role in news seriously.” He accuses the authors of using the “Big-N trick” to intentionally distract from methodological shortcomings. He tweeted that “we need to discuss how very poor corporate big data research gets fast tracked into being published.”

Zeynep Tufekci from UNC wrote on Medium that “I cannot remember a worse apples to oranges comparison” and that the key take-away from the study is actually the ordering effects of the algorithm (which I did not address in this post). “Newsfeed placement is a profoundly powerful gatekeeper for click-through rates.”

Update: (5/10)

A comment helpfully pointed out that I used the wrong percentages in my fourth point when summarizing the piece. Fixed it, with changes marked.

Update: (5/15)

It’s now one week since the Science study. This post has now been cited/linked in The New York Times, Fortune, Time, Wired, Ars Technica, Fast Company, Engaget, and maybe even a few more. I am still getting emails. The conversation has fixated on the <4% sample, often saying something like: "So, Facebook said this was a study about cars, but it was actually only about blue cars.” That’s fine, but the other point in my post is about what is being claimed at all, no matter the sample.

I thought my “coal mine” metaphor about the algorithm would work but it has not always worked. So I’ve clamped my Webcam to my desk lamp and recorded a four-minute video to explain it again, this time with a drawing.******

If the coal mine metaphor failed me, what would be a better metaphor? I’m not sure. Suggestions?

 

 

Notes:

* Diversity in hard news, in their study, would be a self-identified liberal who receives a story from FoxNews.com, or a self-identified conservative who receives one from the HuffingtonPost.com, where the stories are about “national news, politics, [or] world affairs.” In more precise terms, for each user “cross-cutting content” was defined as stories that are more likely to be shared by partisans who do not have the same self-identified ideological affiliation that you do.

** I don’t want to make this even more nitpicky, so I’ll put this in a footnote. The paper’s citations to Mutz and Huckfeldt et al. to mean that “exposure to cross-cutting viewpoints is associated with lower levels of political participation” is just bizarre. I hope it is a typo. These authors don’t advocate against exposure to cross-cutting viewpoints.

*** Perhaps this could be a new Facebook motto used in advertising: “Facebook: Better than one speculative dystopian future!”

**** In fact, algorithm and user form a coupled system of at least two feedback loops. But that’s not helpful to measure “amount” in the way the study wants to, so I’ll just tuck it away down here.

***** Facebook is behind the algorithm but they are trying to peer-review research about it without disclosing how it works — which is a key part of the study. There is also no way to reproduce the research (or do a second study on a primary phenomenon under study, the algorithm) without access to the Facebook platform.

****** In this video, I intentionally conflate (1) the number of posts filtered and (2) the magnitude of the bias of the filtering. I did so because the difficulty with the comparison works the same way for both, and I was trying to make the example simpler. Thanks to Cedric Langbort for pointing out that “baseline error” is the clearest way of explaining this.

(This was cross-posted to multicast and Wired.)

The Google Algorithm as a Robotic Nose

Algorithms, in the view of author Christopher Steiner, are poised to take over everything.  Algorithms embedded in software are now everywhere: Netflix recommendations, credit scores, driving directions, stock trading, Google search, Facebook’s news feed, the TSA’s process to decide who gets searched, the Home Depot prices you are quoted online, and so on. Just a few weeks ago, Ashtan Soltani, the new Chief Technologist of the FTC, has said that “algorithmic transparency”  is his central priority for the US government agency that is tasked with administration of fairness and justice in trade. Commentators are worried that the rise of hidden algorithmic automation is leading to a problematic new “black box society.”

But given that we want to achieve these “transparent” algorithms, how would we do that? Manfred Broy, writing in the context of software engineering, has said that one of the frustrations of working with software is that it is “almost intangible.”  Even if we suddenly obtained the source code for anything we wanted (which is unlikely) it usually not clear what code is doing.  How can we begin to have a meaningful conversation about the consequences of “an algorithm” by achieving some broad, shared understanding of what it is and what it is doing?

06-Sandvig-Seeing-the-Sort-2014-WEB.jpg

 

The answer, even among experts, is that we use metaphor, cartoons, diagrams, and abstraction. As a small beginning to tackling this problem of representing the algorithm, this week I have a new journal article out in the open access journal Media-N, titled “Seeing the Sort.” In it, I try for a critical consideration of how we represent algorithms visually. From flowcharts to cartoons, I go through examples of “algorithm public relations,” meaning both how algorithms are revealed to the public and also what spin the visualizers are trying for.

The most fun of writing the piece was choosing the examples, which include The Algo-Rythmics (an effort to represent algorithms in dance), an algorithm represented as a 19th century grist mill, and this Google cartoon that represents its algorithm as a robotic nose that smells Web pages:

The Google algorithm as a robotic nose that smells Web pages.

Read the article:

Sandvig, Christian. (2015). Seeing the Sort: The Aesthetic and Industrial Defense of “The Algorithm.” Media-N. vol. 10, no. 1. http://median.newmediacaucus.org/art-infrastructures-information/seeing-the-sort-the-aesthetic-and-industrial-defense-of-the-algorithm/

(this was also cross-posted to multicast.)

 

Adding the bling: The role of social media data intermediaries

Last month, Twitter announced the acquisition of Gnip, one of the main sources for social media data—including Twitter data. In my research I am interested in the politics of platforms and data flows in the social web and in this blog post I would like to explore the role of data intermediaries—Gnip in particular—in regulating access to social media data. I will focus on how Gnip regulates the data flows for social media APIs and how it capitalizes on these data flows. By turning the licensing of API access into an profitable business model the role of these data intermediaries have specific implications for social media research.

The history of Gnip

Gnip launched on July 1st, 2008 as a platform offering access to data from various social media sources. It was founded by Jud Valeski and MyBlogLog founder Eric Marcoullier as “a free centralized callback server that notifies data consumers (such as Plaxo) in real-time when there is new data about their users on various data producing sites (such as Flickr and Digg)” (Feld 2008). Eric Marcoullier’s background in blog service MyBlogLog is of particular interest as Gnip has taken core ideas behind the technical infrastructure of the blogosphere and has repurposed them for the social web.

MyBlogLog

MyBlogLog was a distributed social network for bloggers which allowed them to connect to their blog readers. From 2006-2008 I actively used MyBlogLog. I had a MyBlogLog widget in the sidebar of my blog displaying the names and faces of my blog’s latest visitors. As part of my daily blogging routine I checked out my MyBlogLog readers in the sidebar, visited unknown readers’ profile pages and looked at which other blogs they were reading. It was not only a way to establish a community around your blog, but you could also find out more about your readers and use it as a discovery tool to find new and interesting blogs. In 2007, MyBlogLog was acquired by Yahoo! and six months later founder Eric Marcoullier left Yahoo! while his technical co-founder Todd Sampson stayed on (Feld 2008). In February 2008, MyBlogLog added a new feature to their service which displayed “an activity stream of recent activities by all users on various social networks – blog posts, new photos, bookmarks on Delicious, Facebook updates, Twitter updates, etc.” (Arrington 2008). In doing so, they were no longer only focusing on the activities of other bloggers in the blogosphere but also including their activities on social media platforms and moving into the ‘lifestreaming’ space by aggregating social updates in a central space (Gray 2008). As a service originally focused on bloggers, they were expanding their scope to take the increasing symbiotic relationship between the blogosphere and social media platforms into account (Weltevrede & Helmond, 2012). But in 2010 MyBlogLog came to an end when Yahoo! shut down a number of services including del.icio.us and MyBlogLog (Gannes 2010).

Ping – Gnip

After leaving Yahoo! in 2007, MyBlogLog-founder Eric Marcoullier started working on a new idea which would eventually become Gnip. In two blog posts by Brad Feld from Foundry Group–an early Gnip investor–Feld provides insights into the ideas behind Gnip and its name. Gnip is ‘ping’ spelled backwards and Feld recounts how Marcoullier was “originally calling the idea Pingery but somewhere along the way Gnip popped out and it stuck (“meta-ping server” was a little awkward)” (Feld 2008). Ping is a central technique in the blogosphere that allows (blog) search engines and other aggregators to know when a blog has been updated. This notification system is built into blog software so that when you publish a new blog post, it automatically sends out a ping (a XML-RPC signal) that notifies a number of ping services that your blog has been updated. Search engines then poll these services to detect blog updates so that they can index these new blog posts. This means that search engines don’t have poll the millions or billions of blogs out there for updates but that they only have to poll these central ping services. Ping solved a scalability issue of update notifications in the blogosphere because polling a very large number of blogs on a very frequent basis is impossible. Ping servers established themselves as “the backbone of the blogosphere infrastructure and are a crucially important piece of the real-time web” (Arrington 2005). In my MA thesis on the symbiotic relationship between blog software and search engines I describe how ping servers form an essential part of the blogosphere’s infrastructure because they act as centralizing forces in the distributed network of blogs that notify subscriber, aggregators and search engines of new content (Helmond 2008, 70). Blog aggregators and blog search engines could get fresh content from updated blogs by polling central ping servers instead of individual blogs.

APIs as the glue of the social web

Gnip sought to solve a scalability issue of the social web—third parties constantly polling social media platform APIs for new data— in a similar manner by becoming a central point for new content from social media platforms offering access to their data. Traditionally, social media platforms have offered (partial) access to their data to outsiders by using APIs, application programming interfaces. APIs can be seen as the industry-preferred method to gain access to platform data—in contrast to screen scraping as an early method to repurpose social media data (Helmond & Sandvig, 2010). Social media platforms can regulate data access through their APIs, for example by limiting which data is available and how much of it can be requested and by whom. APIs allow external developers to build new applications on top of social media platforms and they have enabled the development of an ecosystem of services and apps that make use of social media platform data and functionality (see also Bucher 2013). Think for example of Tinder, the dating app, which is built on top of the Facebook platform. When you install Tinder you have to log in with your Facebook account, after which the dating app finds matches based on proximity but also on shared Facebook friends and shared Facebook likes. Another example of how APIs are used is the practice of sharing content across various social media platforms using social buttons (Helmond 2013). APIs can be seen as the glue of the social web, connecting social media platforms and creating a social media ecosystem.

APIs overload

But the birth of this new “ecosystem of connective media” (van Dijck 2013) and its reliance on APIs (Langlois et. al 2009) came with technical growing pains:

Web services that became popular overnight had performance issues, especially when their APIs were getting hammered. The solution for some was to simply turn off specific services when the load got high, or throttle (limit) the number of API calls in a certain time period from each individual IP address (Feld 2008).

With the increasing number of third-party applications constantly requesting data, some platforms started to limit access or completely shut down API access. This did not only have implications for developers building apps on top of platforms but also for the users of these platforms. Twitter implemented a daily limit of 70 requests per hour which also affected users. If you exceeded the 70 requests per hour—which also included tweeting, replying or retweeting—you simply were simply cut off. Actively live tweeting an event could easily exceed the imposed limit. In the words of Nate Tkacz, commenting on another user being barred from posting during a conference: “in this world, to be prolific, is to be a spammer.”

capt
Collection of Twitter users commenting on Twitter’s rate limits. Slide from my 2012  API critiques lecture.

However, limiting the number of API calls, or shutting down API access did not fix the actual problem and affected users too. Gnip was created to address the issue of third-parties constantly polling social media platform APIs for new data by bringing these different APIs together into one system (Feld 2008). Similar to central ping services in the blogosphere Gnip would become the central service to call social media APIs and to poll for new data: “Gnip plans to sit in the middle of this and transform all of these interactions back to many-to-one where there are many web services talking to one centralized service – Gnip” (Feld 2008). Instead of thousands of applications frequently calling individual social media platform APIs, they could now call a single API, the Gnip API thereby leveraging the API load for these platforms. Since its inception Gnip has acted as an intermediary of social data and it was specifically designed “to sit in between social networks and other web services that produce a lot of user content and data (like Digg, Delicious, Flickr, etc.) and data consumers (like Plaxo, SocialThing, MyBlogLog, etc.) with the express goal of reducing API load and making the services more efficient” (Arrington 2008). In a blogpost on Techcrunch, covering the launch of Gnip, author Nik Cubrilovic explains in detail how Gnip functions as “a web services proxy to enable consuming services to easily access user data from a variety of sources:”

A publisher can either push data to Gnip using their API’s, or Gnip can poll the latest user data. For consumers, Gnip offers a standards-based API to access all the data across the different publishers. A key advantage of Gnip is that new events are pushed to the consumer, rather than relying on the consuming application to poll the publishers multiple times as a way of finding new events. For example, instead of polling Digg every few seconds for a new event for a particular user, Gnip can ping the consuming service – saving multiple round-trip API requests and resolving a large-scale problem that exists with current web services infrastructure. With a ping-based notification mechanism for new events via Gnip the publisher can be spared the load of multiple polling requests from multiple consuming applications (Cubrilovic 2008).

Gnip launched as a central service offering access to a great number of popular APIs from platforms including Digg, Flickr, del.icio.us, MyBlogLog, Six Apart and more. At launch, technology blog ReadWrite described the new service as “the grand central station and universal translation service for the new social web” (Kirkpatrick 2008).

Gnip’s business model as data proxy

Gnip regulates the data flows between various social media platforms and social media data consumers by licensing access to these data flows. In September 2008, a few months after the initial launch, Gnip launched it’s “2.0” version which no longer required data consumers to poll for new data with Gnip, but instead, new data would be pushed to them in real-time (Arrington 2008). While Gnip initially launched as a free service, the new version also came with a freemium business model:

Gnip’s business model is freemium – lots of data for free and commercial data consumers pay when they go over certain thresholds (non commercial use is free). The model is based on the number of users and the number of filters tracked. Basically, any time a service is tracking more than 10,000 people and/or rules for a certain data provider, they’ll start paying at a rate of $0.01 per user or rule per month, with a maximum payment of $1,000 per month for each data provider tracked (Arrington 2008).

Gnip connects to various social media platform APIs and then licenses access to this data through the single Gnip API. In doing so Gnip has turned data reselling—besides advertising—into a profitable business model for the social web, not only for Gnip itself but also for social media platforms that make use of Gnip. I will continue by briefly discussing Gnip and Twitter’s relationship before discussing the implications of this emerging business model for social media researchers.

Gnip and Twitter

Gnip and Twitter’s relationship goes back to 2008 when Twitter decided to open up its data stream by giving Gnip access to the Twitter XMPP “firehose” which sent out all of Twitter’s data in a realtime data stream (Arrington 2008). At Gnip’s launch Twitter was not part of the group of platforms offering access to their data. A week after the launch Eric Marcoullier explained “That Twitter Thing” to its users—who were asking for Twitter data—by explaining that Gnip was still waiting for access to Twitter’s data and by outlining how Twitter could benefit from doing so. Only a week later Twitter gave Gnip access to their resource-intensive XMPP “firehose” thereby shifting the infrastructural load, that it was suffering from, to Gnip. With this data access deal Gnip and Twitter became unofficial partners. On October 2008 Twitter outlined the different ways to get data into and out of Twitter for developers and hinted at giving Gnip access to its full data, including meta-data, which until then had been on an experimental basis. It wasn’t until 2010 that their partnership with experimental perks became official.

In 2010 Gnip became Twitter’s first authorized data reseller offering access to “the Halfhose (50 percent of Tweets at a cost of $30,000 per month), the Decahose (10 percent of Tweets for $5,000 per month) and the Mentionhose (all mentions of a user including @replies and re-Tweets for $20,000 per month)” (Gannes 2010). Notably absent is the so-called ‘firehose,’ the real-time stream of all tweets. Twitter previously sold access to the firehose to Google ($15 million) and Microsoft ($10 million) in 2009. Before the official partnership announcement with Gnip, Twitter’s pricing model for granting access to data had been rather arbitrary since ““Twitter is focused on creating consumer products and we’re not built to license data,” Williams said, adding, “Twitter has always invested in the ecosystem and startups and we believe that a lot of innovation can happen on top of the data. Pricing and terms definitely vary by where you are from a corporate perspective”” (Gannes 2010). In this interview Evan Williams states that Twitter was never built for licensing data, which may be a reason they entered into a relationship with Gnip in the first place. In contrast to Twitter, Gnip’s infrastructure was built to regulate API traffic which at the same time enables the monetization of licensing access to the data available through APIs. This became even clearer in August 2012 when Twitter announced a new version of its API which came with a new and stricter rate limiting (Sippey 2012). The new restrictions imposed through the Twitter API version 1.1 meant that developers could request less data which affected third-party clients for Twitter (Warren 2012).

Two weeks later Twitter launched its “Certified Products Program” which focused on three product categories: engagement, analytics and data resellers—including Gnip (Lardinois 2012). With the introduction of Certified Products shortly after the new API restrictions, Twitter made clear that large scale access to Twitter data had to be bought. In a blog post addressing the changes in the new Twitter API v1.1, Gnip’s product manager Adam Torres calculates that the new restrictions come down to 80% less data (Tornes 2013). In the same post he also promotes Gnip as the paid-for solution:

Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits (Tornes 2013).

In February 2012 Gnip announced that it would become the first authorized reseller of “historical” (the past 30 days) for Twitter data. This marked another important moment in Gnip and Twitter’s business relationship, followed by the announcement of Gnip offering full access to historical Twitter data in October.

Twitter’s business model: Advertising & data licensing

The new API and the Certified Products Program point towards a shift in Twitter’s business model by introducing intermediaries such as analytics companies and data resellers for access to large scale Twitter data.

Despite Williams’ statement that Twitter wasn’t built for licensing data, it had previously been making a bit of money by selling access to its firehose as previously described. However, the main source of income for Twitter has always come from selling advertisements: “Twitter is an advertising business, and ads make up nearly 90% of the company’s revenue.” (Edwards 2014). While Twitter’s current business model relies on advertising, data licensing as a source of income is growing steadily: “In 2013, Twitter got $70 million in data licensing payments, up 48% from the year before” (Edwards 2014).

Using social media data for research

If we are moving towards the licensing of API access as a business model, then what does this mean for researchers working with social media data? Gnip is only one of the four data intermediaries—together with DataSift, Dataminr and Topsy (now owned by Apple, an indicator of big players buying up the middleman market of data)—offering access to Twitter’s firehose. Additionally, Gnip (now owned by Twitter) and Topsy (now owned by Apple) also offer access to the historical archive of all tweets. What are the consequences of intermediaries for researchers working with Twitter data? boyd & Crawford (2011) and Bruns & Stieglitz (2013) have previously addressed the issues that researchers are facing when working with APIs. With the introduction of data intermediaries data access has become increasingly hard to come by since ‘full’ access is often no longer available from the original source (the social media platform) but only through intermediaries at a hefty price.

Two months before the acquisition of Gnip by Twitter they announced a partnership in a new Data Grants program that would give a small selection of academic researchers access to all Twitter data. However, by applying for the grants program you had to accept their “Data Grant Submission Agreement v1.0.” Researcher Eszter Hargittai critically investigated the conditions of getting access to data for research and raised some important questions about the relationship between Twitter and researchers in her blog post ‘Wait, so what do you still own?

Even if we gain access to an expensive resource such as Gnip, the intermediaries also point to a further obfuscation of the data we are working with. The application programming interface (API), as the name already indicates, provides an interface to the data which explicates that we are always “interfacing” with the data and that we never have access to the “raw” data. In “Raw Data is an Oxymoron” edited by Lisa Gitelman, Bowker reminds us that data is never “raw” but always “cooked” (2013, p.  2). Social media intermediaries play an important role in “cooking” data. Gnip “cooks” its data by “Adding the Bling” referring to the addition of extra metadata to Twitter data. These so-called “Enrichments” include geo-data enrichments which “adds a new kind of Twitter geodata from what may be natively available from social sources.” In other words, Twitter data is enriched with data from other sources such as Foursquare logins.

For researchers, working with social media data intermediaries also requires new skills and new ways of thinking through data by seeing social media data as relational. Social media data are not only aggregated and combined but also instantly cooked through the addition of “bling.”

Acknowledgements

I would like to thank the Social Media Collective and visiting researchers for providing feedback on my initial thoughts behind this blogpost during my visit from April 14-18 at Microsoft Research New England. Thank you Kate Crawford, Nancy Baym, Mary Gray, Kate Miltner, Tarleton Gillespie, Megan Finn, Jonathan Sterne, Li Cornfeld as well as my colleague Thomas Poell from the University of Amsterdam.

Cross-posted from my own blog