Skip to content

Adding the bling: The role of social media data intermediaries

May 7, 2014

Last month, Twitter announced the acquisition of Gnip, one of the main sources for social media data—including Twitter data. In my research I am interested in the politics of platforms and data flows in the social web and in this blog post I would like to explore the role of data intermediaries—Gnip in particular—in regulating access to social media data. I will focus on how Gnip regulates the data flows for social media APIs and how it capitalizes on these data flows. By turning the licensing of API access into an profitable business model the role of these data intermediaries have specific implications for social media research.

The history of Gnip

Gnip launched on July 1st, 2008 as a platform offering access to data from various social media sources. It was founded by Jud Valeski and MyBlogLog founder Eric Marcoullier as “a free centralized callback server that notifies data consumers (such as Plaxo) in real-time when there is new data about their users on various data producing sites (such as Flickr and Digg)” (Feld 2008). Eric Marcoullier’s background in blog service MyBlogLog is of particular interest as Gnip has taken core ideas behind the technical infrastructure of the blogosphere and has repurposed them for the social web.

MyBlogLog

MyBlogLog was a distributed social network for bloggers which allowed them to connect to their blog readers. From 2006-2008 I actively used MyBlogLog. I had a MyBlogLog widget in the sidebar of my blog displaying the names and faces of my blog’s latest visitors. As part of my daily blogging routine I checked out my MyBlogLog readers in the sidebar, visited unknown readers’ profile pages and looked at which other blogs they were reading. It was not only a way to establish a community around your blog, but you could also find out more about your readers and use it as a discovery tool to find new and interesting blogs. In 2007, MyBlogLog was acquired by Yahoo! and six months later founder Eric Marcoullier left Yahoo! while his technical co-founder Todd Sampson stayed on (Feld 2008). In February 2008, MyBlogLog added a new feature to their service which displayed “an activity stream of recent activities by all users on various social networks – blog posts, new photos, bookmarks on Delicious, Facebook updates, Twitter updates, etc.” (Arrington 2008). In doing so, they were no longer only focusing on the activities of other bloggers in the blogosphere but also including their activities on social media platforms and moving into the ‘lifestreaming’ space by aggregating social updates in a central space (Gray 2008). As a service originally focused on bloggers, they were expanding their scope to take the increasing symbiotic relationship between the blogosphere and social media platforms into account (Weltevrede & Helmond, 2012). But in 2010 MyBlogLog came to an end when Yahoo! shut down a number of services including del.icio.us and MyBlogLog (Gannes 2010).

Ping – Gnip

After leaving Yahoo! in 2007, MyBlogLog-founder Eric Marcoullier started working on a new idea which would eventually become Gnip. In two blog posts by Brad Feld from Foundry Group–an early Gnip investor–Feld provides insights into the ideas behind Gnip and its name. Gnip is ‘ping’ spelled backwards and Feld recounts how Marcoullier was “originally calling the idea Pingery but somewhere along the way Gnip popped out and it stuck (“meta-ping server” was a little awkward)” (Feld 2008). Ping is a central technique in the blogosphere that allows (blog) search engines and other aggregators to know when a blog has been updated. This notification system is built into blog software so that when you publish a new blog post, it automatically sends out a ping (a XML-RPC signal) that notifies a number of ping services that your blog has been updated. Search engines then poll these services to detect blog updates so that they can index these new blog posts. This means that search engines don’t have poll the millions or billions of blogs out there for updates but that they only have to poll these central ping services. Ping solved a scalability issue of update notifications in the blogosphere because polling a very large number of blogs on a very frequent basis is impossible. Ping servers established themselves as “the backbone of the blogosphere infrastructure and are a crucially important piece of the real-time web” (Arrington 2005). In my MA thesis on the symbiotic relationship between blog software and search engines I describe how ping servers form an essential part of the blogosphere’s infrastructure because they act as centralizing forces in the distributed network of blogs that notify subscriber, aggregators and search engines of new content (Helmond 2008, 70). Blog aggregators and blog search engines could get fresh content from updated blogs by polling central ping servers instead of individual blogs.

APIs as the glue of the social web

Gnip sought to solve a scalability issue of the social web—third parties constantly polling social media platform APIs for new data— in a similar manner by becoming a central point for new content from social media platforms offering access to their data. Traditionally, social media platforms have offered (partial) access to their data to outsiders by using APIs, application programming interfaces. APIs can be seen as the industry-preferred method to gain access to platform data—in contrast to screen scraping as an early method to repurpose social media data (Helmond & Sandvig, 2010). Social media platforms can regulate data access through their APIs, for example by limiting which data is available and how much of it can be requested and by whom. APIs allow external developers to build new applications on top of social media platforms and they have enabled the development of an ecosystem of services and apps that make use of social media platform data and functionality (see also Bucher 2013). Think for example of Tinder, the dating app, which is built on top of the Facebook platform. When you install Tinder you have to log in with your Facebook account, after which the dating app finds matches based on proximity but also on shared Facebook friends and shared Facebook likes. Another example of how APIs are used is the practice of sharing content across various social media platforms using social buttons (Helmond 2013). APIs can be seen as the glue of the social web, connecting social media platforms and creating a social media ecosystem.

APIs overload

But the birth of this new “ecosystem of connective media” (van Dijck 2013) and its reliance on APIs (Langlois et. al 2009) came with technical growing pains:

Web services that became popular overnight had performance issues, especially when their APIs were getting hammered. The solution for some was to simply turn off specific services when the load got high, or throttle (limit) the number of API calls in a certain time period from each individual IP address (Feld 2008).

With the increasing number of third-party applications constantly requesting data, some platforms started to limit access or completely shut down API access. This did not only have implications for developers building apps on top of platforms but also for the users of these platforms. Twitter implemented a daily limit of 70 requests per hour which also affected users. If you exceeded the 70 requests per hour—which also included tweeting, replying or retweeting—you simply were simply cut off. Actively live tweeting an event could easily exceed the imposed limit. In the words of Nate Tkacz, commenting on another user being barred from posting during a conference: “in this world, to be prolific, is to be a spammer.”

capt

Collection of Twitter users commenting on Twitter’s rate limits. Slide from my 2012  API critiques lecture.

However, limiting the number of API calls, or shutting down API access did not fix the actual problem and affected users too. Gnip was created to address the issue of third-parties constantly polling social media platform APIs for new data by bringing these different APIs together into one system (Feld 2008). Similar to central ping services in the blogosphere Gnip would become the central service to call social media APIs and to poll for new data: “Gnip plans to sit in the middle of this and transform all of these interactions back to many-to-one where there are many web services talking to one centralized service – Gnip” (Feld 2008). Instead of thousands of applications frequently calling individual social media platform APIs, they could now call a single API, the Gnip API thereby leveraging the API load for these platforms. Since its inception Gnip has acted as an intermediary of social data and it was specifically designed “to sit in between social networks and other web services that produce a lot of user content and data (like Digg, Delicious, Flickr, etc.) and data consumers (like Plaxo, SocialThing, MyBlogLog, etc.) with the express goal of reducing API load and making the services more efficient” (Arrington 2008). In a blogpost on Techcrunch, covering the launch of Gnip, author Nik Cubrilovic explains in detail how Gnip functions as “a web services proxy to enable consuming services to easily access user data from a variety of sources:”

A publisher can either push data to Gnip using their API’s, or Gnip can poll the latest user data. For consumers, Gnip offers a standards-based API to access all the data across the different publishers. A key advantage of Gnip is that new events are pushed to the consumer, rather than relying on the consuming application to poll the publishers multiple times as a way of finding new events. For example, instead of polling Digg every few seconds for a new event for a particular user, Gnip can ping the consuming service – saving multiple round-trip API requests and resolving a large-scale problem that exists with current web services infrastructure. With a ping-based notification mechanism for new events via Gnip the publisher can be spared the load of multiple polling requests from multiple consuming applications (Cubrilovic 2008).

Gnip launched as a central service offering access to a great number of popular APIs from platforms including Digg, Flickr, del.icio.us, MyBlogLog, Six Apart and more. At launch, technology blog ReadWrite described the new service as “the grand central station and universal translation service for the new social web” (Kirkpatrick 2008).

Gnip’s business model as data proxy

Gnip regulates the data flows between various social media platforms and social media data consumers by licensing access to these data flows. In September 2008, a few months after the initial launch, Gnip launched it’s “2.0” version which no longer required data consumers to poll for new data with Gnip, but instead, new data would be pushed to them in real-time (Arrington 2008). While Gnip initially launched as a free service, the new version also came with a freemium business model:

Gnip’s business model is freemium – lots of data for free and commercial data consumers pay when they go over certain thresholds (non commercial use is free). The model is based on the number of users and the number of filters tracked. Basically, any time a service is tracking more than 10,000 people and/or rules for a certain data provider, they’ll start paying at a rate of $0.01 per user or rule per month, with a maximum payment of $1,000 per month for each data provider tracked (Arrington 2008).

Gnip connects to various social media platform APIs and then licenses access to this data through the single Gnip API. In doing so Gnip has turned data reselling—besides advertising—into a profitable business model for the social web, not only for Gnip itself but also for social media platforms that make use of Gnip. I will continue by briefly discussing Gnip and Twitter’s relationship before discussing the implications of this emerging business model for social media researchers.

Gnip and Twitter

Gnip and Twitter’s relationship goes back to 2008 when Twitter decided to open up its data stream by giving Gnip access to the Twitter XMPP “firehose” which sent out all of Twitter’s data in a realtime data stream (Arrington 2008). At Gnip’s launch Twitter was not part of the group of platforms offering access to their data. A week after the launch Eric Marcoullier explained “That Twitter Thing” to its users—who were asking for Twitter data—by explaining that Gnip was still waiting for access to Twitter’s data and by outlining how Twitter could benefit from doing so. Only a week later Twitter gave Gnip access to their resource-intensive XMPP “firehose” thereby shifting the infrastructural load, that it was suffering from, to Gnip. With this data access deal Gnip and Twitter became unofficial partners. On October 2008 Twitter outlined the different ways to get data into and out of Twitter for developers and hinted at giving Gnip access to its full data, including meta-data, which until then had been on an experimental basis. It wasn’t until 2010 that their partnership with experimental perks became official.

In 2010 Gnip became Twitter’s first authorized data reseller offering access to “the Halfhose (50 percent of Tweets at a cost of $30,000 per month), the Decahose (10 percent of Tweets for $5,000 per month) and the Mentionhose (all mentions of a user including @replies and re-Tweets for $20,000 per month)” (Gannes 2010). Notably absent is the so-called ‘firehose,’ the real-time stream of all tweets. Twitter previously sold access to the firehose to Google ($15 million) and Microsoft ($10 million) in 2009. Before the official partnership announcement with Gnip, Twitter’s pricing model for granting access to data had been rather arbitrary since ““Twitter is focused on creating consumer products and we’re not built to license data,” Williams said, adding, “Twitter has always invested in the ecosystem and startups and we believe that a lot of innovation can happen on top of the data. Pricing and terms definitely vary by where you are from a corporate perspective”” (Gannes 2010). In this interview Evan Williams states that Twitter was never built for licensing data, which may be a reason they entered into a relationship with Gnip in the first place. In contrast to Twitter, Gnip’s infrastructure was built to regulate API traffic which at the same time enables the monetization of licensing access to the data available through APIs. This became even clearer in August 2012 when Twitter announced a new version of its API which came with a new and stricter rate limiting (Sippey 2012). The new restrictions imposed through the Twitter API version 1.1 meant that developers could request less data which affected third-party clients for Twitter (Warren 2012).

Two weeks later Twitter launched its “Certified Products Program” which focused on three product categories: engagement, analytics and data resellers—including Gnip (Lardinois 2012). With the introduction of Certified Products shortly after the new API restrictions, Twitter made clear that large scale access to Twitter data had to be bought. In a blog post addressing the changes in the new Twitter API v1.1, Gnip’s product manager Adam Torres calculates that the new restrictions come down to 80% less data (Tornes 2013). In the same post he also promotes Gnip as the paid-for solution:

Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits (Tornes 2013).

In February 2012 Gnip announced that it would become the first authorized reseller of “historical” (the past 30 days) for Twitter data. This marked another important moment in Gnip and Twitter’s business relationship, followed by the announcement of Gnip offering full access to historical Twitter data in October.

Twitter’s business model: Advertising & data licensing

The new API and the Certified Products Program point towards a shift in Twitter’s business model by introducing intermediaries such as analytics companies and data resellers for access to large scale Twitter data.

Despite Williams’ statement that Twitter wasn’t built for licensing data, it had previously been making a bit of money by selling access to its firehose as previously described. However, the main source of income for Twitter has always come from selling advertisements: “Twitter is an advertising business, and ads make up nearly 90% of the company’s revenue.” (Edwards 2014). While Twitter’s current business model relies on advertising, data licensing as a source of income is growing steadily: “In 2013, Twitter got $70 million in data licensing payments, up 48% from the year before” (Edwards 2014).

Using social media data for research

If we are moving towards the licensing of API access as a business model, then what does this mean for researchers working with social media data? Gnip is only one of the four data intermediaries—together with DataSift, Dataminr and Topsy (now owned by Apple, an indicator of big players buying up the middleman market of data)—offering access to Twitter’s firehose. Additionally, Gnip (now owned by Twitter) and Topsy (now owned by Apple) also offer access to the historical archive of all tweets. What are the consequences of intermediaries for researchers working with Twitter data? boyd & Crawford (2011) and Bruns & Stieglitz (2013) have previously addressed the issues that researchers are facing when working with APIs. With the introduction of data intermediaries data access has become increasingly hard to come by since ‘full’ access is often no longer available from the original source (the social media platform) but only through intermediaries at a hefty price.

Two months before the acquisition of Gnip by Twitter they announced a partnership in a new Data Grants program that would give a small selection of academic researchers access to all Twitter data. However, by applying for the grants program you had to accept their “Data Grant Submission Agreement v1.0.” Researcher Eszter Hargittai critically investigated the conditions of getting access to data for research and raised some important questions about the relationship between Twitter and researchers in her blog post ‘Wait, so what do you still own?

Even if we gain access to an expensive resource such as Gnip, the intermediaries also point to a further obfuscation of the data we are working with. The application programming interface (API), as the name already indicates, provides an interface to the data which explicates that we are always “interfacing” with the data and that we never have access to the “raw” data. In “Raw Data is an Oxymoron” edited by Lisa Gitelman, Bowker reminds us that data is never “raw” but always “cooked” (2013, p.  2). Social media intermediaries play an important role in “cooking” data. Gnip “cooks” its data by “Adding the Bling” referring to the addition of extra metadata to Twitter data. These so-called “Enrichments” include geo-data enrichments which “adds a new kind of Twitter geodata from what may be natively available from social sources.” In other words, Twitter data is enriched with data from other sources such as Foursquare logins.

For researchers, working with social media data intermediaries also requires new skills and new ways of thinking through data by seeing social media data as relational. Social media data are not only aggregated and combined but also instantly cooked through the addition of “bling.”

Acknowledgements

I would like to thank the Social Media Collective and visiting researchers for providing feedback on my initial thoughts behind this blogpost during my visit from April 14-18 at Microsoft Research New England. Thank you Kate Crawford, Nancy Baym, Mary Gray, Kate Miltner, Tarleton Gillespie, Megan Finn, Jonathan Sterne, Li Cornfeld as well as my colleague Thomas Poell from the University of Amsterdam.

Cross-posted from my own blog

SMC is hiring a Research Assistant!

May 1, 2014

UPDATE: At this time we have a great pool for 2014 and are no longer accepting applications.

—-
Microsoft Research (MSR) is looking for a Research Assistant for its Social Media Collective in the New England lab, based in Cambridge, Massachusetts. The Social Media Collective consists of Nancy Baym, Mary Gray, Jessa Lingel, and Kevin Driscoll in Cambridge, and Kate Crawford and danah boyd in New York City, as well as faculty visitors and Ph.D. interns. The RA will be working directly with Nancy Baym, Kate Crawford and Mary Gray.

An appropriate candidate will be a self-starter who is passionate and knowledgeable about the social and cultural implications of technology. Strong skills in writing, organisation and academic research are essential, as are time-management and multi-tasking. Minimal qualifications are a BA or equivalent degree in a humanities or social science discipline and some qualitative research training.

Job responsibilities will include:
– Sourcing and curating relevant literature and research materials
– Producing literature reviews and/or annotated bibliographies
– Coding ethnographic and interview data
– Editing manuscripts
– Working with academic journals on themed sections
– Assisting with research project and event organization

The RA will also get to collaborate on ongoing research and, while publication is not a guarantee, the RA will be encouraged to co-author papers while at MSR. The RAship will require 40 hours per week on site in Cambridge, MA, and remote collaboration with the researchers in the New York City lab. It is a 1-year only contractor position, paid hourly with flexible daytime hours. The start date will ideally be in late June, although flexibility is possible for the right candidate.

This position is ideal for junior scholars who will be applying to PhD programs in Communication, Media Studies, Sociology, Anthropology, Information Studies, and related fields and want to develop and hone their research skills before entering a graduate program. Current New England-based MA/PhD students are welcome to apply provided they can commit to 40 hours of on-site work per week.

To apply, please send an email to Nancy Baym (baym@microsoft.com) with the subject “RA Application” and include the following attachments:

- One-page (single-spaced) personal statement, including a description of research experience, interests, and professional goals
– CV or resume
– Writing sample (preferably a literature review or a scholarly-styled article)
– Links to online presence (e.g., blog, homepage, Twitter, journalistic endeavors, etc.)
– The names and emails of two recommenders

We will begin reviewing applications on May 12 and will continue to do so until we find an appropriate candidate.

Please feel free to ask quesions about the position in the comments! I have answered a couple of the most common ones there already.

Call For Papers: Studying Selfies: Evidence, Affect, Ethics, and the Internet’s Visual Turn

April 15, 2014

Studying Selfies: Evidence, Affect, Ethics, and the Internet’s Visual Turn
A special section of the International Journal of Communication (IJoC)

Guest-edited by:

Dr. Theresa Senft
Master Teacher in Global Liberal Studies
New York University
Terri.senft@nyu.edu

Dr. Nancy Baym
Principal Researcher
Microsoft Research
baym@microsoft.com

 

Overview

The fact that “selfie” was Oxford English Dictionary’s word of the year for 2013 indicates that the selfie is a topic of popular interest. Yet for scholars, the selfie phenomenon represents a paradox. As an object, the selfie lends itself to cultural scorn and shaming. As a cultural practice, however, selfie circulation grows by the moment, moving far beyond the clichéd province of bored teenagers online. The rapid spread of camera-enabled mobile phones worldwide means that selfies have become a global phenomenon. Yet dominant discourses about what selfies are, and what they mean, tend to be extremely U.S. focused.

In this special section, we aim to provide international perspectives on selfies.  As an act of production, we are interested in why selfie-making lends itself to discussions featuring words like “narcissistic” or “empowering.” As a media genre, we are interested in the relationship of the selfie to documentary, autobiography, advertising, and celebrity. As a cultural signifier, we ask:  What social work does a selfie do in communities where it was intended to circulate, and what happens when it circulates beyond those communities?

As an emblematic part of the social media’s increased “visual turn,” selfies provide opportunities for scholars to develop best practices for interpreting images online in rigorous ways. Case studies of selfie production, consumption and circulation can also provide much needed insight into the social dynamics at play on popular social media platforms like Facebook, Instagram, Reddit, WeChat and Tumblr.

We are seeking scholarly articles from diverse fields, and a wide range of theoretical and methodological approaches, including: media studies, communication, anthropology, digital humanities, computational and social sciences, cultural geography, history, and critical cultural studies.

 

Possible topics include, but are not limited to:

Selfie as discourse: What is the history (or histories) of the selfie? How do these histories map to contemporary media and scholarly discourses regarding self-representation, autobiography, photography, amateurism, branding, and/or celebrity?

Selfie as evidence: What are the epistemological ramifications of the selfie? How do selfies function as evidence that one attended an event, feels intimate with a partner, was battered in a parking lot, is willing to be “authentic” with fans, or claims particular standing in a social or political community? One uploaded, how do selfies become evidence of a different sort, subject to possibilities like “revenge porn,” data mining, or state surveillance?

Selfie as affect: What feelings do selfies elicit for those who produce, view, and/or circulate them? What are we to make of controversial genres like infant selfies, soldier selfies, selfies with homeless people, or selfies at funerals? How do these discourses about controversial selfies map to larger conversations about “audience numbness” and “empathy deficit” in media?

Selfie as ethics: Who practices “empowering” selfie generation? Who does not? Who cannot? How do these questions map to larger issues of class, race, gender, sexuality, religion and geography? What responsibilities do those who circulate selfies of others have toward the original creator of the photo? What is the relationship between selfies and other forms of documentary photography, with regard to ethics?

Selfie as pedagogy: How can selfies be used as case studies to better understand the visual turn in social media use? How do selfies “speak,” and what methods might we develop to better understand what is being said?

 

Formatting and Requirements

To be considered for this collection, a paper of maximum 5,000 words (including images with captions, footnotes, references and appendices, if any) must be submitted by June 15, 2014. All submissions should be accompanied by two to three suggested reviewers including their e-mail addresses, titles, affiliations and research interests. Submissions will fall under the category of “Features” which are typically shorter than full research articles.

All submissions must adhere strictly to the most recent version of the APA styleguide (including in-text citations and references).  Papers must include the author(s) name, title, affiliation and e-mail address. Any papers that do not follow these guidelines will not be submitted for peer review.

 The International Journal of Communication is an open access journal (ijoc.org). All articles will be available online at the point of publication. The anticipated publication timeframe for this special section is March 2015.

 

Contact Information

All submissions should be emailed to ijocselfieissue@outlook.com by June 15, 2014. Late submissions will not be included for consideration. 

404 Day: A Day of Action Against Censorship in Libraries and Public Schools

April 3, 2014

(Cross-posted from Radical Reference and jessalingel.tumblr.com)

Tomorrow is 404 Day, an effort from the Electronic Frontier Foundation to raise awareness of online censorship in libraries and public schools.  They’re running an online info session today at noon, PST, and they’ve reached out to librarians and information professionals to share experiences with online censorship.

My encounters with 404 pages in libraries have mostly stemmed from my academic rather than librarian life.  While in graduate school, I undertook a project looking at practices of secrecy in the extreme body modification community.  I wanted to know how the community circulated information about illegal and quasi-legal procedures among insiders, without exposing the same information to outsiders and the authorities.  As a researcher, getting a 404 message (which happened mostly when trying to access a social network platform geared specifically to the body modification community) was mostly exasperating, but it also gave me pause for other contexts of looking up this type of information.  As a teenager, body modification fascinated me, and I spent many hours online researching procedures related to piercings, tattoos, scarification and suspension.  Eventually, I came to feel very much a part of the body modification community, and the internet was vital to that happening.  When I imagine what would have happened if I’d been confronted with 404 pages early on in those searches, it’s possible that my body would look very different, and so would my early twenties – in both cases, I believe, for the worse.  My experiences were by no means singular; while conducting research on EBM, I encountered many folks who were still struggling to locate information about procedures they wanted done, to get answers to questions about health and well being, to find a community that wouldn’t find their interests weird or freakish.  EBM is just one example of a stigmatized topic that provokes censorship at the cost of denying people information that can be deeply tied to their physical, mental and social well-being.

I’m grateful to EFF for drawing attention to 404s and monitoring policies, and am happy to join the array of information activists speaking out against censorship in public libraries and schools.

Matrix Algebra: how to be human in a digital economy

March 31, 2014

By Sara C. Kingsley and Dr. Mary L. Gray

(cross-posted to CultureDigitally and The Center for Popular Economics)

 

ExhibitionMathamatica

Ray and Charles Working on a Conceptual Model for the Exhibition Mathematica, 1960, photograph. Prints & Photographs Division, Library of Congress (A-22a)

“Certainly the cost of living has increased, but the cost of everything else has likewise increased,”[1] H.G. Burt, the President of the Union Pacific Railroad, asserted to railroad company machinists and boilermakers. For Burt, the “cost of everything else” included the cost of labor. His remedy: place “each workman on his [own] merit.” In 1902, “workman merit” to a tycoon like H.G. Burt squarely meant equating the value of labor, or the worth of a person, to the amount of output each individual produced. Union Pacific Railroad eventually made use of this logic by replacing the hourly wages of workers with a piece rate system. Employers switched to piecework systems around the turn of the 19th century largely to reduce labor costs by weeding out lower skilled workers, and cutting the wages of workers unable to keep apace with the “speeding up” of factory production.

Employers historically leveraged piecework as a managerial tool, reconfiguring labor markets to the employers’ advantage by allowing production rates, rather than time on the job, to measure productivity. Whatever a person produced that was not quantifiable as a commodity, in other words, did not constitute work. We’ve seen other examples of discounted labor in spaces outside the factory. Feminist economists fight to this day, for example, for the work of caregivers and housewives, largely ignored by mainstream economic theory, to gain recognition as “real” forms of labor. Real benefits and income are lost to those whose work goes unaccounted.

As the historical record shows, workers do not typically accept arbitrary changes to their terms of employment handed down by management. In fact, the Union Pacific Railroad machinists protested Burt’s decision to set their wages through a piecework system. H.G. Burt met their resistance with this question: is it “right for any man to ask for more money than he is actually worth or can earn?”

But what is a person truly worth in terms of earning power? And what societal, cultural, and economic factors limit a person from earning more?

In 2014, the question of a person’s worth in relation to their work, or the value of labor itself, is no less prescient. The rhetoric surrounding workers’ rights compared to those of business differs little whether one browses the archives of a twentieth century newspaper or scrolls Facebook posts. Ironically enough though, in the age of social media and citizen reporting, the utter lack of visibility and adequate representation of today’s workers stands in stark contrast to the piece rate workers of H.G. Burt’s day. Few soundbites or talking points, let alone byline articles, focus on the invisible labor foundational to today’s information economies. Nowhere is this more clearly illustrated than with crowdwork.

Legal scholar Alek L. Felstiner’s defines crowdworking as, “the process of taking tasks that would normally be delegated to an employee and distributing them to a large pool of online workers, the ‘crowd’” (2011). Hundreds of thousands of people regularly do piecework tasks online for commercial, crowdsourcing sites like Amazon.com’s Mechanical Turk (“AMT”).

Over the last year, we’ve worked with Dr. Siddharth Suri and an international team of researchers, to uncover the invisible forms of labor online, and people who rely upon digital piecework for a significant portion of their income. Crowdwork is, arguably, the most economically valuable, yet invisible, form of labor that the Internet has ever produced. Take Google’s search engine for instance. Each time you search for an image online (to create the next most hilarious meme, or find a infograph for a conference presentation) you’re benefitting from the labor of thousands of crowdworkers who have identified or ranked the image your search populates. While this service may be valuable to you, the workers doing it, only receive a few cents for their contributions to your meme or slideshow presentation. Additionally, a typical crowdworker living in the United States makes, on average, 2 to 3 dollars an hour. We need to ask ourselves: what is fair compensation for the value that workers bring to our lives? How would you feel if tomorrow, all your favorite, seemingly free, online services that depend on these digital pieceworkers, disappeared?

Last fall, we spent four months in South India talking with crowdworkers and learning about their motivations for doing this type of work. In the process we met people with far ranging life experiences, but a common story to tell – perhaps familiar to all of us who’ve earned a wage for our keep: work is not all we are, but most of what we do is work. And increasingly, the capacity to maintain a living above the poverty line is elusive, and complicated by what “being poor” means in a global economy. Our hopes for finding more satisfying work, a life valued for what it is rather than what it is not — is no less, even as we confront the realities of today.

Moshe Marvit spoke to the complexities of crowdwork as a form of viable employment in a compelling account of U.S. workers’ experience with Amazon Mechanical Turk. He describes this popular crowdsourcing platform as “one of the most exploited workforces no one has ever seen.” Marvit emphasizes how crowdwork remains a thing universally unacknowledged, in that more and more tasks, from researchers’ web-based surveys and to Twitter’s real-time deciphering of trending topics, depend on crowdwork. However, most people still don’t know that behind their screen is an army of click workers. Anyone, who has ever browsed an online catalogue or searched the web for a restaurant’s physical address, has benefited from a person completing small, crowdworked task online. Pointedly, our web experience is better because of the thousands of unknown workers who labor to optimize the online spaces we employ.

As Marvit points out, and our research also notes, people only earn pennies at a time for doing the small crowd tasks not yet fully automatable by computer algorithms. These crowd tasks, however, add up to global systems whose monetary worth sometimes trumps that of small nations. Yet, when we ask our peers and colleagues, “do you know who the thousands of low income workers are behind your web browser?” We receive mystified stares, and many reply “I don’t know.”

The hundreds of thousands of people who regularly work in your web browser are not the youth of Silicon Valley’s tech industry. They likely cannot afford Google glass, or ride to work in corporate buses. Some are college educated, but, like people today – they are stuck in careers that undervalue their real worth, in addition to discounting the investments they’ve already made in their education, skills, and the unique set of values they’ve gained from their own life experiences.

Yet, the more our research team learns about crowdworkers’ lives, the more we realized how little we know about the economic value of crowdwork and the makeup of the crowdworking labor force. And as Marvit notes, we still don’t have a good grasp of what someone is doing, legally speaking, when they do crowdwork. Should we categorize crowdwork as freelance work? Contract labor? Temporary or part-time work?

In the absence of answers to these questions, some have called for policy solutions to mitigate the noted and sometimes glaring inequities in power distributed between those posting tasks (or, jobs) to crowdwork platforms, and those seeking to do crowdwork online. But, we argue, good labor policy that makes sense of crowdwork, from a legal or technical point of view, can’t be adequately drafted until we understand what people expect and experience doing task-based work online. Who does crowdwork? Where, how, and why do they do it? And how does crowdworking fit into the rest of their lives, not to mention our global workflows? When we can answer these questions, we’ll be ready to talk about how to define crowdwork in more meaningful ways. Until then, we resist dubbing crowdwork “exploitative” or “ideal,” because doing so is meaningless to the millions of people who crowdwork, and ignores the builders and programmers out there trying to improve these technologies.

We are all implicated in the environments we rely on and utilize in our daily lives, including the Internet. Those who mindlessly request and outsource tasks to the crowd without regard to crowdworkers’ rights, are perhaps, no more at fault than the rest of us who expect instant, high quality web services every time we search or do other activities online. An important lesson from Union Pacific Railroad still holds true: workers are not expendable.

[1]Omaha daily bee. (Omaha [Neb.]), 01 July 1902. Chronicling America: Historic American Newspapers. Lib. of Congress. <http://chroniclingamerica.loc.gov/lccn/sn99021999/1902-07-01/ed-1/seq-1/>

Show-and-Tell: Algorithmic Culture

March 25, 2014

Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.

I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.”  They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:

Three Ways to Demonstrate Algorithmic Culture

(1.) puppy dog hate (Google Instant)

You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter].  That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:

Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “).  Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.

What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].

This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.

It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.

Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)

This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.

Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.

(2.) Top Stories Fail (Facebook)

In this example, you’ll need a Facebook account.  Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)

Switch this toggle back and forth and look at how your feed changes.

What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.

We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”

Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)

It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)

What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.

There are deeper things to say about Facebook, but this is enough to start with. Onward.

(3.) Your DoubleClick Cookie Filling (DoubleClick)

This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google

Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct.  Here’s a screen shot:

If you have “interests” listed, click on “Edit” to see a list of topics.

What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.

What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.

This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.

This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).

Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”).  Right now it thinks I am interested in 27 things, some of them are:

  • Standardized & Admissions Tests (Yes.)
  • Roleplaying Games (Yes.)
  • Dishwashers (No.)
  • Dresses (No.)

You can also type in your own interests to save Google the trouble of profiling you.

Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.

These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast.  I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.

In my next post I’ll write about how the students reacted to all this.

(This was also cross-posted to multicast.)

Why Snapchat is Valuable: It’s All About Attention

March 21, 2014

Most people who encounter a link to this post will never read beyond this paragraph. Heck, most people who encountered a link to this post didn’t click on the link to begin with. They simply saw the headline, took note that someone over 30 thinks that maybe Snapchat is important, and moved onto the next item in their Facebook/Twitter/RSS/you-name-it stream of media. And even if they did read it, I’ll never know it because they won’t comment or retweet or favorite this in any way.

We’ve all gotten used to wading in streams of social media content. Open up Instagram or Secret on your phone and you’ll flick on through the posts in your stream, looking for a piece of content that’ll catch your eye. Maybe you don’t even bother looking at the raw stream on Twitter. You don’t have to because countless curatorial services like digg are available to tell you what was most important in your network. Facebook doesn’t even bother letting you see your raw stream; their algorithms determine what you get access to in the first place (unless, of course, someone pays to make sure their friends see their content).

Snapchat offers a different proposition. Everyone gets hung up on how the disappearance of images may (or may not) afford a new kind of privacy. Adults fret about how teens might be using this affordance to share inappropriate (read: sexy) pictures, projecting their own bad habits onto youth. But this is isn’t what makes Snapchat utterly intriguing. What makes Snapchat matter has to do with how it treats attention.

When someone sends you an image/video via Snapchat, they choose how long you get to view the image/video. The underlying message is simple: You’ve got 7 seconds. PAY ATTENTION. And when people do choose to open a Snap, they actually stop what they’re doing and look.

In a digital world where everyone’s flicking through headshots, images, and text without processing any of it, Snapchat asks you to stand still and pay attention to the gift that someone in your network just gave you. As a result, I watch teens choose not to open a Snap the moment they get it because they want to wait for the moment when they can appreciate whatever is behind that closed door. And when they do, I watch them tune out everything else and just concentrate on what’s in front of them. Rather than serving as yet-another distraction, Snapchat invites focus.

Furthermore, in an ecosystem where people “favorite” or “like” content that is inherently unlikeable just to acknowledge that they’ve consumed it, Snapchat simply notifies the creator when the receiver opens it up. This is such a subtle but beautiful way of embedding recognition into the system. Sometimes, a direct response is necessary. Sometimes, we need nothing more than a simple nod, a way of signaling acknowledgement. And that’s precisely why the small little “opened” note will bring a smile to someone’s face even if the recipient never said a word.

Snapchat is a reminder that constraints have a social purpose, that there is beauty in simplicity, and that the ephemeral is valuable. There aren’t many services out there that fundamentally question the default logic of social media and, for that, I think that we all need to pay attention to and acknowledge Snapchat’s moves in this ecosystem.

(This post was originally published on LinkedIn. More comments can be found there.)

Follow

Get every new post delivered to your Inbox.

Join 1,194 other followers