Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.
I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.” They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:
Three Ways to Demonstrate Algorithmic Culture
(1.) puppy dog hate (Google Instant)
You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter]. That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:
Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “). Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.
What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].
This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.
It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.
Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)
This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.
Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.
(2.) Top Stories Fail (Facebook)
In this example, you’ll need a Facebook account. Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)
Switch this toggle back and forth and look at how your feed changes.
What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.
We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”
Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)
It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)
What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.
There are deeper things to say about Facebook, but this is enough to start with. Onward.
(3.) Your DoubleClick Cookie Filling (DoubleClick)
This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google
Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct. Here’s a screen shot:
If you have “interests” listed, click on “Edit” to see a list of topics.
What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.
What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.
This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.
This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).
Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”). Right now it thinks I am interested in 27 things, some of them are:
- Standardized & Admissions Tests (Yes.)
- Roleplaying Games (Yes.)
- Dishwashers (No.)
- Dresses (No.)
You can also type in your own interests to save Google the trouble of profiling you.
Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.
These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast. I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.
In my next post I’ll write about how the students reacted to all this.
(This was also cross-posted to multicast.)
A “pay it back tax” on data brokers: a modest (and also politically untenable and impossibly naïve) policy proposal
I’ve just returned from the “Social, Cultural, and Ethical Dimensions of Big Data” event, held by the Data & Society Initiative (led by danah boyd), and spurred by the efforts of the White House Office of Technology and Policy to develop a comprehensive report on issues of privacy, discrimination, and rights around big data. And my head is buzzing. (Oh boy. Here he goes.) There must be something about ma and workshops aimed at policy issues. Even though this event was designed to be wide-ranging and academic, I always get this sense of urgency or pressure that we should be working towards concrete policy recommendations. It’s something I rarely do in my scholarly work (to its detriment, I’d say, wouldn’t you?) But I don’t tend to come up with reasonable, incremental, or politically viable policy recommendations anyway. I get frustrated that the range of possible interventions feels so narrow, so many players that must be untouched, so many underlying presumptions left unchallenged. I don’t want to suggest some progressive but narrow intervention, and in the process confirm and reify the way things are – though believe me, I admire the people who can do this. I long for there to be a robust vocabulary for saying what we want as a society and what we’re willing to change, reject, regulate, or transform to get it. (But at some point, if it’s too pie in the sky, it ceases being a policy recommendation, doesn’t it?) And this is especially true when it comes to daring to restrain commercial actors who are doing something that can be seen as publicly detrimental, but somehow have this presumed right to engage in this activity because they have the right to profit. I want to be able to say, in some instances, “sorry, no, this simply isn’t a thing you get to profit on.”
All that said, I’m going to propose a policy recommendation. (It’s going to be a politically unreasonable one, you watch.)
I find myself concerned about this hazy category of stakeholders that, at our event, were generally called “data brokers.” There are probably different kinds of data brokers that we might think about: companies that buy up and combine data about consumers; companies that scrape public data from wherever it is available and create troves of consumer profiles. I’m particularly troubled by the kind of companies that Kate Crawford discussed in her excellent editorial for Scientific American a few weeks ago — like Turnstyle, a company that has set up dummy wifi transponders in major cities to pick up all those little pings your smartphone gives off when its looking for networks. Turnstyle coordinates those pings into a profile of how you navigated the city (i.e. you and your phone walked down Broadway, spent twenty minutes in the bakery, then drove to the south side), then aggregates those navigation profiles into data about consumers and their movements through the city and sells them to marketers. (OK, that is particularly infuriating.) What defines this category for me is that data brokers do not gather data as part of a direct service they provide to those individuals. Instead they gather at a point once removed from the data subjects: such as purchasing the data gathered by others, scraping our public utterances or traces, or tracking the evidence of our activity we give off. I don’t know that I can be much more specific than that, or that I’ve captured all the flavors, in part because I’ve only begun to think about them (oh good, then this is certain to be a well-informed suggestion!) and because they are a shadowy part of the data industry, relatively far with consumers, with little need to advertise or maintain a particularly public profile.
I think these stakeholders are in a special category, in terms of policy, for a number of reasons. First, they are important to questions of privacy and discrimination in data, as they help to move data beyond the settings in which we authorized its collection and use. Second, they are outside of traditional regulations that are framed around specific industries and their data use (like HIPAA provisions that regulate hospitals and medical record keepers, but not data brokers who might nevertheless traffic in health data). Third, they’re a newly emergent part of the data ecosystem, so they have not been thought about in the development of older legislation. But most importantly, they are a business that offers no social value to the individual or society whose data is being gathered. (Uh oh.) In all of the more traditional instances in which data is collected about individuals, there is some social benefit or service presumed to be offered in exchange. The government conducts a census, but we authorized that, because it is essential to the provision of government services: proportional representation of elected officials, fair imposition of taxation, etc. Verizon collects data on us, but they do so as a fundamental element of the provision of telephone service. Facebook collect all of our traces, and while that data is immensely valuable in its own right and to advertisers it is also an important component in providing their social media platform. I am by no means saying that there are no possible harms in such data arrangements (I should hope not) but at the very least, the collection of data comes with the provision of service, and there is a relationship (citizen, customer) that provides a legally structured and sanctioned space for challenging the use and misuse of that data — class action lawsuit, regulatory oversight, protest, or just switching to another phone company. (Have you tried switching phone companies lately?) Some services that collect data have even voluntarily sought to do additional, socially progressive things with that data: Google looking for signs of flu outbreaks, Facebook partnering with researchers looking to encourage voting behavior, even OK Cupid giving us curious insights about the aggregate dating habits of their customers. (You just love infographics, don’t you.) But the third party data broker who buys data from an e-commerce site I frequent, or scrapes my publicly available hospital discharge record, or grabs up the pings my phone emits as I walk through town, they are building commercial value on my data, but offer me no value to me, my community, or society in exchange.
So what I propose is a “pay it back tax” on data brokers. (Huh?! Does such a thing exist, anywhere?) If a company collects, aggregates, or scrapes data on people, and does so not as part of a service back to those people (but is that distinction even a tenable one? who would decide and patrol which companies are subject to this requirement?), then they must grant access to their data and access 10% of their revenue to non-profit, socially progressive uses of that data. This could mean they could partner with a non-profit, provide them funds and access to data, to conduct research. Or, they could make the data and dollars available as a research fund that non-profits and researchers could apply for. Or, as a nuclear option, they could avoid the financial requirement by providing an open API to their data. (I thought your concern about these brokers is that they aggravate the privacy problems of big data, but you’re making them spread that collected data further?) I think there could be valuable partnerships: Turnstyle’s data might be particularly useful for community organizations concerned about neighborhood flow or access for the disabled; health data could be used by researchers or activists concerned with discrimination in health insurance. There would need to be parameters for how that data was used and protected by the non-profits who received it, and perhaps an open access requirement for any published research or reports.
This may seem extreme. (I should say so. Does this mean any commercial entity in any industry that doesn’t provide a service to customers should get a similar tax?) Or, from another vantage point, it could be seen as quite reasonable: companies that collect data on their own have to spend an overwhelming amount of their revenue providing whatever service they do that justifies this data collection: governments that collect data on us are in our service, and make no profit. This is merely 10% and sharing their valuable resource. (No, it still seems extreme.) And, if I were aiming more squarely at the concerns about privacy, I’d be tempted to say that data aggregation and scraping could simply be outlawed. (Somebody stop him!) In my mind, it at the very least levels back the idea that collecting data on individuals and using that as a primary resource upon which to make profit must, on balance, provide some service in return, be it customer service, social service, or public benefit.
This is cross-posted at Culture Digitally.
This was an incredible, overwhelming year for internship applications. We had well over 200 PhD students apply, and we were deeply impressed by the quality of suggested projects. Thanks to everyone for your submissions. Here are the four people who will be joining us over the summer – congratulations to you all. We’re looking forward to working with you!
Tressie McMillan Cottom is a Ph.D. candidate in the Sociology Department at Emory University in Atlanta, GA. Broadly Tressie studies organizations, inequality, and education. Her doctoral research is a comparative study of the expansion of for-profit colleges (like the University of Phoenix) in the 1990s.) She will be working with Kate, Mary and Nancy this summer on a project about hashtag activist groups on Twitter and their ties to institutional power.
Luke Stark is a PhD student in the Department of Media, Culture, and Communication at New York University under the supervision of Helen Nissenbaum. His dissertation research focuses on the history and philosophy of digital media technologies, and their use in tracking, monitoring and shaping the everyday emotional lives and experiences of users. This summer he will be working with Kate on epistemologies of big data, privacy, and computational culture.
Katrin Tiidenberg is a Ph.D. candidate at the Institute of International and Social Studies at Tallinn University in Estonia. Her dissertation is about online experience and identity in the context of NSFW blogs on Tumblr. She will be working this summer with Nancy on a project about selfies, power and shame.
Kathryn Zyskowski is a Ph.D. Student in the Department of Anthropology at the University of Washington and an Editorial Intern at the Journal of the Society for Cultural Anthropology. Her doctoral work examines identity, representation, and Muslim/Hindu relations in South India. This summer, she will work with Mary studying how people crowdsourcing in India and the United States use online discussion forums to organize their work and structure their identities as workers in specific locations.
(Reblogged from jessalingel.tumblr.com)
It’s the last day of the iconference and I’m just leaving an awesome, much needed discussion of social justice issues related to library and information science. It’s always affirming to see people in my field who care about social justice exchanging ideas, frustrations, success stories, failure stories and giving advice, here are some brief notes from the discussion. Many of these examples focus on teaching and academic life, but there are ways to reposition them towards other contexts.
+Discomfort is okay. Nicole Cooke pointed out that it’s actually productive and useful to generate moments of discomfort in class – I really appreciate this point as a reminder that as tempting as it is to shy away from moments of social awkwardness that come from identifying gaps in privilege, it can also be an important opportunity to reshape assumptions.
+When it comes to convincing administrators and senior faculty of the importance, we need allies who are higher ups and money talks. The members of the panel were from GSLIS at the ischool at Illinois, and they noted the importance of having champions in their program. Also, having received a grant to work on diversity and inclusion lends a degree of legitimacy to politics of challenging heteronormativity.
+Even if we’re making our classes full of theories of power, students self-select for classes specifically geared towards issues of race class and gender, so how do we get issues of social justice into the curriculum as a whole? Some inventive ideas include course releases for faculty to partner with existing classes to integrate issues of critical theory and social justice into coursework. Also, a clearer articulation of how these efforts fit into the category of service. Another idea is building momentum with interdisciplinary efforts towards feminist ideology, like Laura Portwood-Stacer’s efforts to generate conversations of feminists working on social media at a range of communication and HCI conferences.
+When it comes to the examples that you’re using in class, it’s important to think about the examples that we use. It’s an easy thing to bring up with colleagues as a way of talking about diversity that can be fairly easily integrated into the classroom. (Shout out to Emily Knox for making this point.)
Organized as self-defense forces, some residents of the Mexican state of Michoácan have been attempting to regain control of their towns from powerful organized criminals. Although these Mexican militias have received a fair amount of media coverage, its fascinating social media presence has not been examined. Saiph Savage, a grad student at UNAM/UCSB, and I have started to collect some data, and wanted to share some initial observations of one of the militias’ online spaces: Valor por Michoacán, a Facebook page with more than 130,000 followers devoted to documenting the activities of the self-defense militia groups in their fight against the Knights Templar Cartel. We contrast this page with a similar one from a different state: Valor por Tamaulipas, which has enabled residents of that state cope with the Drug War related violence.
I’m thrilled to announce that our anthology, Media Technologies: Essays on Communication, Materiality, and Society, edited by myself with Pablo Boczkowski and Kirsten Foot, is now officially available from MIT Press. Contributors include Geoffrey Bowker, Finn Brunton, Gabriella Coleman, Gregory Downey, Steven Jackson, Christopher Kelty, Leah Lievrouw, Sonia Livingstone, Ignacio Siles, Jonathan Sterne, Lucy Suchman, and Fred Turner. We’ve secured permission to share the introduction with you. A blurb:
In recent years, scholarship around media technologies has finally shed the presumption that technologies are separate from and powerfully determining of social life, seeing them instead as produced by and embedded in distinct social, cultural, and political practices – and as socially significant because of that. This has been helped along by a productive intersection between work in science and technology studies (STS) interested in information technologies as complex sociomaterial phenomena, and work in communication and media studies attuned to the symbolic and public dimensions of these tools.
In this volume, scholars from both fields come together to provide some conceptual paths forward for future scholarship. Two sets of essays and commentaries comprise this collection: the first addresses the relationship between materiality and mediation, considering such topics as the lived realities of network infrastructure. The second highlights media technologies as fragile and malleable, held together through the minute, unobserved work of many, including efforts to keep these technologies alive.
Please feel free to circulate this introduction to others, and write back to us with your thoughts, criticisms, and ideas. We hope this volume helps anchor the exciting conversations we see happening in the field, and serves a launchpad for future scholarship.