Studying Selfies: Evidence, Affect, Ethics, and the Internet’s Visual Turn
A special section of the International Journal of Communication (IJoC)
Dr. Theresa Senft
Master Teacher in Global Liberal Studies
New York University
Dr. Nancy Baym
The fact that “selfie” was Oxford English Dictionary’s word of the year for 2013 indicates that the selfie is a topic of popular interest. Yet for scholars, the selfie phenomenon represents a paradox. As an object, the selfie lends itself to cultural scorn and shaming. As a cultural practice, however, selfie circulation grows by the moment, moving far beyond the clichéd province of bored teenagers online. The rapid spread of camera-enabled mobile phones worldwide means that selfies have become a global phenomenon. Yet dominant discourses about what selfies are, and what they mean, tend to be extremely U.S. focused.
In this special section, we aim to provide international perspectives on selfies. As an act of production, we are interested in why selfie-making lends itself to discussions featuring words like “narcissistic” or “empowering.” As a media genre, we are interested in the relationship of the selfie to documentary, autobiography, advertising, and celebrity. As a cultural signifier, we ask: What social work does a selfie do in communities where it was intended to circulate, and what happens when it circulates beyond those communities?
As an emblematic part of the social media’s increased “visual turn,” selfies provide opportunities for scholars to develop best practices for interpreting images online in rigorous ways. Case studies of selfie production, consumption and circulation can also provide much needed insight into the social dynamics at play on popular social media platforms like Facebook, Instagram, Reddit, WeChat and Tumblr.
We are seeking scholarly articles from diverse fields, and a wide range of theoretical and methodological approaches, including: media studies, communication, anthropology, digital humanities, computational and social sciences, cultural geography, history, and critical cultural studies.
Possible topics include, but are not limited to:
Selfie as discourse: What is the history (or histories) of the selfie? How do these histories map to contemporary media and scholarly discourses regarding self-representation, autobiography, photography, amateurism, branding, and/or celebrity?
Selfie as evidence: What are the epistemological ramifications of the selfie? How do selfies function as evidence that one attended an event, feels intimate with a partner, was battered in a parking lot, is willing to be “authentic” with fans, or claims particular standing in a social or political community? One uploaded, how do selfies become evidence of a different sort, subject to possibilities like “revenge porn,” data mining, or state surveillance?
Selfie as affect: What feelings do selfies elicit for those who produce, view, and/or circulate them? What are we to make of controversial genres like infant selfies, soldier selfies, selfies with homeless people, or selfies at funerals? How do these discourses about controversial selfies map to larger conversations about “audience numbness” and “empathy deficit” in media?
Selfie as ethics: Who practices “empowering” selfie generation? Who does not? Who cannot? How do these questions map to larger issues of class, race, gender, sexuality, religion and geography? What responsibilities do those who circulate selfies of others have toward the original creator of the photo? What is the relationship between selfies and other forms of documentary photography, with regard to ethics?
Selfie as pedagogy: How can selfies be used as case studies to better understand the visual turn in social media use? How do selfies “speak,” and what methods might we develop to better understand what is being said?
Formatting and Requirements
To be considered for this collection, a paper of maximum 5,000 words (including images with captions, footnotes, references and appendices, if any) must be submitted by June 15, 2014. All submissions should be accompanied by two to three suggested reviewers including their e-mail addresses, titles, affiliations and research interests. Submissions will fall under the category of “Features” which are typically shorter than full research articles.
All submissions must adhere strictly to the most recent version of the APA styleguide (including in-text citations and references). Papers must include the author(s) name, title, affiliation and e-mail address. Any papers that do not follow these guidelines will not be submitted for peer review.
The International Journal of Communication is an open access journal (ijoc.org). All articles will be available online at the point of publication. The anticipated publication timeframe for this special section is March 2015.
All submissions should be emailed to firstname.lastname@example.org by June 15, 2014. Late submissions will not be included for consideration.
Tomorrow is 404 Day, an effort from the Electronic Frontier Foundation to raise awareness of online censorship in libraries and public schools. They’re running an online info session today at noon, PST, and they’ve reached out to librarians and information professionals to share experiences with online censorship.
My encounters with 404 pages in libraries have mostly stemmed from my academic rather than librarian life. While in graduate school, I undertook a project looking at practices of secrecy in the extreme body modification community. I wanted to know how the community circulated information about illegal and quasi-legal procedures among insiders, without exposing the same information to outsiders and the authorities. As a researcher, getting a 404 message (which happened mostly when trying to access a social network platform geared specifically to the body modification community) was mostly exasperating, but it also gave me pause for other contexts of looking up this type of information. As a teenager, body modification fascinated me, and I spent many hours online researching procedures related to piercings, tattoos, scarification and suspension. Eventually, I came to feel very much a part of the body modification community, and the internet was vital to that happening. When I imagine what would have happened if I’d been confronted with 404 pages early on in those searches, it’s possible that my body would look very different, and so would my early twenties – in both cases, I believe, for the worse. My experiences were by no means singular; while conducting research on EBM, I encountered many folks who were still struggling to locate information about procedures they wanted done, to get answers to questions about health and well being, to find a community that wouldn’t find their interests weird or freakish. EBM is just one example of a stigmatized topic that provokes censorship at the cost of denying people information that can be deeply tied to their physical, mental and social well-being.
I’m grateful to EFF for drawing attention to 404s and monitoring policies, and am happy to join the array of information activists speaking out against censorship in public libraries and schools.
By Sara C. Kingsley and Dr. Mary L. Gray
(cross-posted to CultureDigitally and The Center for Popular Economics)
Ray and Charles Working on a Conceptual Model for the Exhibition Mathematica, 1960, photograph. Prints & Photographs Division, Library of Congress (A-22a)
“Certainly the cost of living has increased, but the cost of everything else has likewise increased,” H.G. Burt, the President of the Union Pacific Railroad, asserted to railroad company machinists and boilermakers. For Burt, the “cost of everything else” included the cost of labor. His remedy: place “each workman on his [own] merit.” In 1902, “workman merit” to a tycoon like H.G. Burt squarely meant equating the value of labor, or the worth of a person, to the amount of output each individual produced. Union Pacific Railroad eventually made use of this logic by replacing the hourly wages of workers with a piece rate system. Employers switched to piecework systems around the turn of the 19th century largely to reduce labor costs by weeding out lower skilled workers, and cutting the wages of workers unable to keep apace with the “speeding up” of factory production.
Employers historically leveraged piecework as a managerial tool, reconfiguring labor markets to the employers’ advantage by allowing production rates, rather than time on the job, to measure productivity. Whatever a person produced that was not quantifiable as a commodity, in other words, did not constitute work. We’ve seen other examples of discounted labor in spaces outside the factory. Feminist economists fight to this day, for example, for the work of caregivers and housewives, largely ignored by mainstream economic theory, to gain recognition as “real” forms of labor. Real benefits and income are lost to those whose work goes unaccounted.
As the historical record shows, workers do not typically accept arbitrary changes to their terms of employment handed down by management. In fact, the Union Pacific Railroad machinists protested Burt’s decision to set their wages through a piecework system. H.G. Burt met their resistance with this question: is it “right for any man to ask for more money than he is actually worth or can earn?”
But what is a person truly worth in terms of earning power? And what societal, cultural, and economic factors limit a person from earning more?
In 2014, the question of a person’s worth in relation to their work, or the value of labor itself, is no less prescient. The rhetoric surrounding workers’ rights compared to those of business differs little whether one browses the archives of a twentieth century newspaper or scrolls Facebook posts. Ironically enough though, in the age of social media and citizen reporting, the utter lack of visibility and adequate representation of today’s workers stands in stark contrast to the piece rate workers of H.G. Burt’s day. Few soundbites or talking points, let alone byline articles, focus on the invisible labor foundational to today’s information economies. Nowhere is this more clearly illustrated than with crowdwork.
Legal scholar Alek L. Felstiner’s defines crowdworking as, “the process of taking tasks that would normally be delegated to an employee and distributing them to a large pool of online workers, the ‘crowd’” (2011). Hundreds of thousands of people regularly do piecework tasks online for commercial, crowdsourcing sites like Amazon.com’s Mechanical Turk (“AMT”).
Over the last year, we’ve worked with Dr. Siddharth Suri and an international team of researchers, to uncover the invisible forms of labor online, and people who rely upon digital piecework for a significant portion of their income. Crowdwork is, arguably, the most economically valuable, yet invisible, form of labor that the Internet has ever produced. Take Google’s search engine for instance. Each time you search for an image online (to create the next most hilarious meme, or find a infograph for a conference presentation) you’re benefitting from the labor of thousands of crowdworkers who have identified or ranked the image your search populates. While this service may be valuable to you, the workers doing it, only receive a few cents for their contributions to your meme or slideshow presentation. Additionally, a typical crowdworker living in the United States makes, on average, 2 to 3 dollars an hour. We need to ask ourselves: what is fair compensation for the value that workers bring to our lives? How would you feel if tomorrow, all your favorite, seemingly free, online services that depend on these digital pieceworkers, disappeared?
Last fall, we spent four months in South India talking with crowdworkers and learning about their motivations for doing this type of work. In the process we met people with far ranging life experiences, but a common story to tell – perhaps familiar to all of us who’ve earned a wage for our keep: work is not all we are, but most of what we do is work. And increasingly, the capacity to maintain a living above the poverty line is elusive, and complicated by what “being poor” means in a global economy. Our hopes for finding more satisfying work, a life valued for what it is rather than what it is not — is no less, even as we confront the realities of today.
Moshe Marvit spoke to the complexities of crowdwork as a form of viable employment in a compelling account of U.S. workers’ experience with Amazon Mechanical Turk. He describes this popular crowdsourcing platform as “one of the most exploited workforces no one has ever seen.” Marvit emphasizes how crowdwork remains a thing universally unacknowledged, in that more and more tasks, from researchers’ web-based surveys and to Twitter’s real-time deciphering of trending topics, depend on crowdwork. However, most people still don’t know that behind their screen is an army of click workers. Anyone, who has ever browsed an online catalogue or searched the web for a restaurant’s physical address, has benefited from a person completing small, crowdworked task online. Pointedly, our web experience is better because of the thousands of unknown workers who labor to optimize the online spaces we employ.
As Marvit points out, and our research also notes, people only earn pennies at a time for doing the small crowd tasks not yet fully automatable by computer algorithms. These crowd tasks, however, add up to global systems whose monetary worth sometimes trumps that of small nations. Yet, when we ask our peers and colleagues, “do you know who the thousands of low income workers are behind your web browser?” We receive mystified stares, and many reply “I don’t know.”
The hundreds of thousands of people who regularly work in your web browser are not the youth of Silicon Valley’s tech industry. They likely cannot afford Google glass, or ride to work in corporate buses. Some are college educated, but, like people today – they are stuck in careers that undervalue their real worth, in addition to discounting the investments they’ve already made in their education, skills, and the unique set of values they’ve gained from their own life experiences.
Yet, the more our research team learns about crowdworkers’ lives, the more we realized how little we know about the economic value of crowdwork and the makeup of the crowdworking labor force. And as Marvit notes, we still don’t have a good grasp of what someone is doing, legally speaking, when they do crowdwork. Should we categorize crowdwork as freelance work? Contract labor? Temporary or part-time work?
In the absence of answers to these questions, some have called for policy solutions to mitigate the noted and sometimes glaring inequities in power distributed between those posting tasks (or, jobs) to crowdwork platforms, and those seeking to do crowdwork online. But, we argue, good labor policy that makes sense of crowdwork, from a legal or technical point of view, can’t be adequately drafted until we understand what people expect and experience doing task-based work online. Who does crowdwork? Where, how, and why do they do it? And how does crowdworking fit into the rest of their lives, not to mention our global workflows? When we can answer these questions, we’ll be ready to talk about how to define crowdwork in more meaningful ways. Until then, we resist dubbing crowdwork “exploitative” or “ideal,” because doing so is meaningless to the millions of people who crowdwork, and ignores the builders and programmers out there trying to improve these technologies.
We are all implicated in the environments we rely on and utilize in our daily lives, including the Internet. Those who mindlessly request and outsource tasks to the crowd without regard to crowdworkers’ rights, are perhaps, no more at fault than the rest of us who expect instant, high quality web services every time we search or do other activities online. An important lesson from Union Pacific Railroad still holds true: workers are not expendable.
Omaha daily bee. (Omaha [Neb.]), 01 July 1902. Chronicling America: Historic American Newspapers. Lib. of Congress. <http://chroniclingamerica.loc.gov/lccn/sn99021999/1902-07-01/ed-1/seq-1/>
Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.
I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.” They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:
Three Ways to Demonstrate Algorithmic Culture
(1.) puppy dog hate (Google Instant)
You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter]. That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:
Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “). Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.
What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].
This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.
It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.
Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)
This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.
Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.
(2.) Top Stories Fail (Facebook)
In this example, you’ll need a Facebook account. Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)
Switch this toggle back and forth and look at how your feed changes.
What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.
We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”
Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)
It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)
What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.
There are deeper things to say about Facebook, but this is enough to start with. Onward.
(3.) Your DoubleClick Cookie Filling (DoubleClick)
This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google
Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct. Here’s a screen shot:
If you have “interests” listed, click on “Edit” to see a list of topics.
What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.
What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.
This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.
This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).
Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”). Right now it thinks I am interested in 27 things, some of them are:
- Standardized & Admissions Tests (Yes.)
- Roleplaying Games (Yes.)
- Dishwashers (No.)
- Dresses (No.)
You can also type in your own interests to save Google the trouble of profiling you.
Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.
These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast. I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.
In my next post I’ll write about how the students reacted to all this.
(This was also cross-posted to multicast.)
A “pay it back tax” on data brokers: a modest (and also politically untenable and impossibly naïve) policy proposal
I’ve just returned from the “Social, Cultural, and Ethical Dimensions of Big Data” event, held by the Data & Society Initiative (led by danah boyd), and spurred by the efforts of the White House Office of Technology and Policy to develop a comprehensive report on issues of privacy, discrimination, and rights around big data. And my head is buzzing. (Oh boy. Here he goes.) There must be something about ma and workshops aimed at policy issues. Even though this event was designed to be wide-ranging and academic, I always get this sense of urgency or pressure that we should be working towards concrete policy recommendations. It’s something I rarely do in my scholarly work (to its detriment, I’d say, wouldn’t you?) But I don’t tend to come up with reasonable, incremental, or politically viable policy recommendations anyway. I get frustrated that the range of possible interventions feels so narrow, so many players that must be untouched, so many underlying presumptions left unchallenged. I don’t want to suggest some progressive but narrow intervention, and in the process confirm and reify the way things are – though believe me, I admire the people who can do this. I long for there to be a robust vocabulary for saying what we want as a society and what we’re willing to change, reject, regulate, or transform to get it. (But at some point, if it’s too pie in the sky, it ceases being a policy recommendation, doesn’t it?) And this is especially true when it comes to daring to restrain commercial actors who are doing something that can be seen as publicly detrimental, but somehow have this presumed right to engage in this activity because they have the right to profit. I want to be able to say, in some instances, “sorry, no, this simply isn’t a thing you get to profit on.”
All that said, I’m going to propose a policy recommendation. (It’s going to be a politically unreasonable one, you watch.)
I find myself concerned about this hazy category of stakeholders that, at our event, were generally called “data brokers.” There are probably different kinds of data brokers that we might think about: companies that buy up and combine data about consumers; companies that scrape public data from wherever it is available and create troves of consumer profiles. I’m particularly troubled by the kind of companies that Kate Crawford discussed in her excellent editorial for Scientific American a few weeks ago — like Turnstyle, a company that has set up dummy wifi transponders in major cities to pick up all those little pings your smartphone gives off when its looking for networks. Turnstyle coordinates those pings into a profile of how you navigated the city (i.e. you and your phone walked down Broadway, spent twenty minutes in the bakery, then drove to the south side), then aggregates those navigation profiles into data about consumers and their movements through the city and sells them to marketers. (OK, that is particularly infuriating.) What defines this category for me is that data brokers do not gather data as part of a direct service they provide to those individuals. Instead they gather at a point once removed from the data subjects: such as purchasing the data gathered by others, scraping our public utterances or traces, or tracking the evidence of our activity we give off. I don’t know that I can be much more specific than that, or that I’ve captured all the flavors, in part because I’ve only begun to think about them (oh good, then this is certain to be a well-informed suggestion!) and because they are a shadowy part of the data industry, relatively far with consumers, with little need to advertise or maintain a particularly public profile.
I think these stakeholders are in a special category, in terms of policy, for a number of reasons. First, they are important to questions of privacy and discrimination in data, as they help to move data beyond the settings in which we authorized its collection and use. Second, they are outside of traditional regulations that are framed around specific industries and their data use (like HIPAA provisions that regulate hospitals and medical record keepers, but not data brokers who might nevertheless traffic in health data). Third, they’re a newly emergent part of the data ecosystem, so they have not been thought about in the development of older legislation. But most importantly, they are a business that offers no social value to the individual or society whose data is being gathered. (Uh oh.) In all of the more traditional instances in which data is collected about individuals, there is some social benefit or service presumed to be offered in exchange. The government conducts a census, but we authorized that, because it is essential to the provision of government services: proportional representation of elected officials, fair imposition of taxation, etc. Verizon collects data on us, but they do so as a fundamental element of the provision of telephone service. Facebook collect all of our traces, and while that data is immensely valuable in its own right and to advertisers it is also an important component in providing their social media platform. I am by no means saying that there are no possible harms in such data arrangements (I should hope not) but at the very least, the collection of data comes with the provision of service, and there is a relationship (citizen, customer) that provides a legally structured and sanctioned space for challenging the use and misuse of that data — class action lawsuit, regulatory oversight, protest, or just switching to another phone company. (Have you tried switching phone companies lately?) Some services that collect data have even voluntarily sought to do additional, socially progressive things with that data: Google looking for signs of flu outbreaks, Facebook partnering with researchers looking to encourage voting behavior, even OK Cupid giving us curious insights about the aggregate dating habits of their customers. (You just love infographics, don’t you.) But the third party data broker who buys data from an e-commerce site I frequent, or scrapes my publicly available hospital discharge record, or grabs up the pings my phone emits as I walk through town, they are building commercial value on my data, but offer me no value to me, my community, or society in exchange.
So what I propose is a “pay it back tax” on data brokers. (Huh?! Does such a thing exist, anywhere?) If a company collects, aggregates, or scrapes data on people, and does so not as part of a service back to those people (but is that distinction even a tenable one? who would decide and patrol which companies are subject to this requirement?), then they must grant access to their data and access 10% of their revenue to non-profit, socially progressive uses of that data. This could mean they could partner with a non-profit, provide them funds and access to data, to conduct research. Or, they could make the data and dollars available as a research fund that non-profits and researchers could apply for. Or, as a nuclear option, they could avoid the financial requirement by providing an open API to their data. (I thought your concern about these brokers is that they aggravate the privacy problems of big data, but you’re making them spread that collected data further?) I think there could be valuable partnerships: Turnstyle’s data might be particularly useful for community organizations concerned about neighborhood flow or access for the disabled; health data could be used by researchers or activists concerned with discrimination in health insurance. There would need to be parameters for how that data was used and protected by the non-profits who received it, and perhaps an open access requirement for any published research or reports.
This may seem extreme. (I should say so. Does this mean any commercial entity in any industry that doesn’t provide a service to customers should get a similar tax?) Or, from another vantage point, it could be seen as quite reasonable: companies that collect data on their own have to spend an overwhelming amount of their revenue providing whatever service they do that justifies this data collection: governments that collect data on us are in our service, and make no profit. This is merely 10% and sharing their valuable resource. (No, it still seems extreme.) And, if I were aiming more squarely at the concerns about privacy, I’d be tempted to say that data aggregation and scraping could simply be outlawed. (Somebody stop him!) In my mind, it at the very least levels back the idea that collecting data on individuals and using that as a primary resource upon which to make profit must, on balance, provide some service in return, be it customer service, social service, or public benefit.
This is cross-posted at Culture Digitally.
This was an incredible, overwhelming year for internship applications. We had well over 200 PhD students apply, and we were deeply impressed by the quality of suggested projects. Thanks to everyone for your submissions. Here are the four people who will be joining us over the summer – congratulations to you all. We’re looking forward to working with you!
Tressie McMillan Cottom is a Ph.D. candidate in the Sociology Department at Emory University in Atlanta, GA. Broadly Tressie studies organizations, inequality, and education. Her doctoral research is a comparative study of the expansion of for-profit colleges (like the University of Phoenix) in the 1990s.) She will be working with Kate, Mary and Nancy this summer on a project about hashtag activist groups on Twitter and their ties to institutional power.
Luke Stark is a PhD student in the Department of Media, Culture, and Communication at New York University under the supervision of Helen Nissenbaum. His dissertation research focuses on the history and philosophy of digital media technologies, and their use in tracking, monitoring and shaping the everyday emotional lives and experiences of users. This summer he will be working with Kate on epistemologies of big data, privacy, and computational culture.
Katrin Tiidenberg is a Ph.D. candidate at the Institute of International and Social Studies at Tallinn University in Estonia. Her dissertation is about online experience and identity in the context of NSFW blogs on Tumblr. She will be working this summer with Nancy on a project about selfies, power and shame.
Kathryn Zyskowski is a Ph.D. Student in the Department of Anthropology at the University of Washington and an Editorial Intern at the Journal of the Society for Cultural Anthropology. Her doctoral work examines identity, representation, and Muslim/Hindu relations in South India. This summer, she will work with Mary studying how people crowdsourcing in India and the United States use online discussion forums to organize their work and structure their identities as workers in specific locations.