Corrupt Personalization

(“And also Bud Light.”)

In my last two posts I’ve been writing about my attempt to convince a group of sophomores with no background in my field that there has been a shift to the algorithmic allocation of attention — and that this is important. In this post I’ll respond to a student question. My favorite: “Sandvig says that algorithms are dangerous, but what are the the most serious repercussions that he envisions?” What is the coming social media apocalypse we should be worried about?

google flames

This is an important question because people who study this stuff are NOT as interested in this student question as they should be. Frankly, we are specialists who study media and computers and things — therefore we care about how algorithms allocate attention among cultural products almost for its own sake. Because this is the central thing that we study, we don’t spend a lot of time justifying it.

And our field’s most common response to the query “what are the dangers?” often lacks the required sense of danger. The most frequent response is: “extensive personalization is bad for democracy.” (a.k.a. Pariser’s “filter bubble,” Sunstein’s “egocentric” Internet, and so on). This framing lacks a certain house-on-fire urgency, doesn’t it?

(sarcastic tone:) “Oh, no! I’m getting to watch, hear, and read exactly what I want. Help me! Somebody do something!”

Sometimes (as Hindman points out) the contention is the opposite, that Internet-based concentration is bad for democracy.  But remember that I’m not speaking to political science majors here. The average person may not be as moved by an abstract, long-term peril to democracy as the average political science professor. As David Weinberger once said after I warned about the increasing reliance on recommendation algorithms, “So what?” Personalization sounds like a good thing.

As a side note, the second most frequent response I see is that algorithms are now everywhere. And they work differently than what came before. This also lacks a required sense of danger! Yes, they’re everywhere, but if they are a good thing

So I really like this question “what are the the most serious repercussions?” because I think there are some elements of the shift to attention-sorting algorithms that are genuinely “dangerous.” I can think of at least two, probably more, and they don’t get enough attention. In the rest of this post I’ll spell out the first one which I’ll call “corrupt personalization.”

Here we go.

Common-sense reasoning about algorithms and culture tells us that the purveyors of personalized content have the same interests we do. That is, if Netflix started recommending only movies we hate or Google started returning only useless search results we would stop using them. However: Common sense is wrong in this case. Our interests are often not the same as the providers of these selection algorithms.  As in my last post, let’s work through a few concrete examples to make the case.

In this post I’ll use Facebook examples, but the general problem of corrupt personalization is present on all of our media platforms in wide use that employ the algorithmic selection of content.

(1) Facebook “Like” Recycling

Screen Shot 2012-12-10 at 12.44.34 PM

(Image from ReadWriteWeb.)

On Facebook, in addition to advertisements along the side of the interface, perhaps you’ve noticed “featured,” “sponsored,” or “suggested” stories that appear inside your news feed, intermingled with status updates from your friends. It could be argued that this is not in your interest as a user (did you ever say, “gee, I’d like ads to look just like messages from my friends”?), but I have bigger fish to fry.

Many ads on Facebook resemble status updates in that there can be messages endorsing the ads with “likes.” For instance, here is an older screenshot from ReadWriteWeb:

pages you may like on facebook

Another example: a “suggested” post was mixed into my news feed just this morning recommending World Cup coverage on Facebook itself. It’s a Facebook ad for Facebook, in other words.  It had this intriguing addendum:

CENSORED likes facebook

So, wait… I have hundreds of friends and eleven of them “like” Facebook?  Did they go to http://www.facebook.com and click on a button like this:

Facebook like button magnified

But facebook.com doesn’t even have a “Like” button!  Did they go to Facebook’s own Facebook page (yes, there is one) and click “Like”? I know these people and that seems unlikely. And does Nicolala really like Walmart? Hmmm…

What does this “like” statement mean? Welcome to the strange world of “like” recycling. Facebook has defined “like” in ways that depart from English usage.  For instance, in the past Facebook has determined that:

  1. Anyone who clicks on a “like” button is considered to have “liked” all future content from that source. So if you clicked a “like” button because someone shared a “Fashion Don’t” from Vice magazine, you may be surprised when your dad logs into Facebook three years later and is shown a current sponsored story from Vice.com like “Happy Masturbation Month!” or “How to Make it in Porn” with the endorsement that you like it. (Vice.com example is from Craig Condon [NSFW].)
  2. Anyone who “likes” a comment on a shared link is considered to “like” wherever that link points to.  a.k.a. “‘liking a share.” So if you see a (real) FB status update from a (real) friend and it says: “Yuck! The McLobster is a disgusting product idea!” and your (real) friend include a (real) link like this one — that means if you clicked “like” your friends may see McDonald’s ads in the future that include the phrase “(Your Name) likes McDonalds.” (This example is from ReadWriteWeb.)

fauxLike_mcdonalds

This has led to some interesting results, like dead people “liking” current news stories on Facebook.

There is already controversy about advertiser “like” inflation, “like” spam, and fake “likes,” — and these things may be a problem too, but that’s not what we are talking about here.  In the examples above the system is working as Facebook designed it to. A further caveat: note that the definition of “like” in Facebook’s software changes periodically and when they are sued. Facebook now has an opt-out setting for the above two “features.”

But these incendiary examples are exceptional fiascoes — on the whole the system probably works well. You likely didn’t know that your “like” clicks are merrily producing ads on your friends pages and in your name because you cannot see them.  These “stories” do not appear on your news feed and cannot be individually deleted.

Unlike the examples from my last post you can’t quickly reproduce these results with certainty on your own account. Still, if you want to try, make a new Facebook account under a fake name (warning! dangerous!) and friend your real account. Then use the new account to watch your status updates.

Why would Facebook do this? Obviously it is a controversial practice that is not going to be popular with users. Yet Facebook’s business model is to produce attention for advertisers, not to help you — silly rabbit. So they must have felt that using your reputation to produce more ad traffic from your friends was worth the risk of irritating you. Or perhaps they thought that the practice could be successfully hidden from users — that strategy has mostly worked!

In sum this is a personalization scheme that does not serve your goals, it serves Facebook’s goals at your expense.

(2) “Organic” Content

This second group of examples concerns content that we consider to be “not advertising,” a.k.a. “organic” content. Funnily enough, algorithmic culture has produced this new use of the word “organic” — but has also made the boundary between “advertising” and “not advertising” very blurry.

funny-organic-food-ad

 

The general problem is that there are many ways in which algorithms act as mixing valves between things that can be easily valued with money (like ads) and things that can’t. And this kind of mixing is a normative problem (what should we do) and not a technical problem (how do we do it).

For instance, for years Facebook has encouraged nonprofits, community-based organizations, student clubs, other groups, and really anyone to host content on facebook.com.  If an organization creates a Facebook page for itself, the managers can update the page as though it were a profile.

Most page managers expect that people who “like” that page get to see the updates… which was true until January of this year. At that time Facebook modified its algorithm so that text updates from organizations were not widely shared. This is interesting for our purposes because Facebook clearly states that it wants page operators to run Facebook ad campaigns, and not to count on getting traffic from “organic” status updates, as it will no longer distribute as many of them.

This change likely has a very differential effect on, say, Nike‘s Facebook page, a small local business‘s Facebook page, Greenpeace International‘s Facebook page, and a small local church congregation‘s Facebook page. If you start a Facebook page for a school club, you might be surprised that you are spending your labor writing status updates that are never shown to anyone. Maybe you should buy an ad. Here’s an analytic for a page I manage:

this week page likes facebook

 

The impact isn’t just about size — at some level businesses might expect to have to insert themselves into conversations via persuasive advertising that they pay for, but it is not as clear that people expect Facebook to work this way for their local church or other domains of their lives. It’s as if on Facebook, people were using the yellow pages but they thought they were using the white pages.  And also there are no white pages.

(Oh, wait. No one knows what yellow pages and white pages are anymore. Scratch that reference, then.)

No need to stop here, in the future perhaps Facebook can monetize my family relationships. It could suggest that if I really want anyone to know about the birth of my child, or I really want my “insightful” status updates to reach anyone, I should turn to Facebook advertising.

Let me also emphasize that this mixing problem extends to the content of our personal social media conversations as well. A few months back, I posted a Facebook status update that I thought was humorous. I shared a link highlighting the hilarious product reviews for the Bic “Cristal For Her” ballpoint pen on Amazon. It’s a pen designed just for women.

bic crystal for her

The funny thing is that I happened to look at a friend of mine’s Facebook feed over their shoulder, and my status update didn’t go away. It remained, pegged at the top of my friend’s news feed, for as long as 14 days in one instance. What great exposure for my humor, right? But it did seem a little odd… I queried my other friends on Facebook and some confirmed that the post was also pegged at the top of their news feed.

I was unknowingly participating in another Facebook program that converts organic status updates into ads. It does this by changing their order in the news feed and adding the text “Sponsored” in light gray, which is very hard to see. Otherwise at least some updates are not changed. I suspect Facebook’s algorithm thought I was advertising Amazon (since that’s where the link pointed), but I am not sure.

This is similar to Twitter’s “Promoted Tweets” but there is one big difference.  In the Facebook case the advertiser promotes content — my content — that they did not write. In effect Facebook is re-ordering your conversations with your friends and family on the basis of whether or not someone mentioned Coke, Levi’s, and Anheuser Busch (confirmed advertisers in the program).

Sounds like a great personal social media strategy there–if you really want people to know about your forthcoming wedding, maybe just drop a few names? Luckily the algorithms aren’t too clever about this yet so you can mix up the word order for humorous effect.

(Facebook status update:) “I am so delighted to be engaged to this wonderful woman that I am sitting here in my Michelob drinking a Docker’s Khaki Collection. And also Coke.”

Be sure to use links. I find the interesting thing about this mixing of the commercial and non-commercial to be that it sounds to my ears like some sort of corny, unrealistic science fiction scenario and yet with the current Facebook platform I believe the above example would work. We are living in the future.

So to recap, if Nike makes a Facebook page and posts status updates to it, that’s “organic” content because they did not pay Facebook to distribute it. Although any rational human being would see it as an ad. If my school group does the same thing, that’s also organic content, but they are encouraged to buy distribution — which would make it inorganic. If I post a status update or click “like” in reaction to something that happens in my life and that happens to involve a commercial product, my action starts out as organic, but then it becomes inorganic (paid for) because a company can buy my words and likes and show them to other people without telling me. Got it? This paragraph feels like we are rethinking CHEM 402.

The upshot is that control of the content selection algorithm is used by Facebook to get people to pay for things they wouldn’t expect to pay for, and to show people personalized things that they don’t think are paid for. But these things were in fact paid for.  In sum this is again a scheme that does not serve your goals, it serves Facebook’s goals at your expense.

The Danger: Corrupt Personalization

With these concrete examples behind us, I can now more clearly answer this student question. What are the most serious repercussions of the algorithmic allocation of attention?

I’ll call this first repercussion “corrupt personalization” after C. Edwin Baker. (Baker, a distinguished legal philosopher, coined the phrase “corrupt segmentation” in 1998 as an extension of the theories of philosopher Jürgen Habermas.)

Here’s how it works: You have legitimate interests that we’ll call “authentic.” These interests arise from your values, your community, your work, your family, how you spend your time, and so on. A good example might be that as a person who is enrolled in college you might identify with the category “student,” among your many other affiliations. As a student, you might be authentically interested in an upcoming tuition increase or, more broadly, about the contention that “there are powerful forces at work in our society that are actively hostile to the college ideal.”

However, you might also be authentically interested in the fact that your cousin is getting married. Or in pictures of kittens.

Grumpy-Cat-meme-610x405

Corrupt personalization is the process by which your attention is drawn to interests that are not your own. This is a little tricky because it is impossible to clearly define an “authentic” interest. However, let’s put that off for the moment.

In the prior examples we saw some (I hope) obvious places where my interests diverged from that of algorithmic social media systems. Highlights for me were:

  • When I express my opinion about something to my friends and family, I do not want that opinion re-sold without my knowledge or consent.
  • When I explicitly endorse something, I don’t want that endorsement applied to other things that I did not endorse.
  • If I want to read a list of personalized status updates about my friends and family, I do not want my friends and family sorted by how often they mention advertisers.
  • If a list of things is chosen for me, I want the results organized by some measure of goodness for me, not by how much money someone has paid.
  • I want paid content to be clearly identified.
  • I do not want my information technology to sort my life into commercial and non-commercial content and systematically de-emphasize the noncommercial things that I do, or turn these things toward commercial purposes.

More generally, I think the danger of corrupt personalization is manifest in three ways.

  1. Things that are not necessarily commercial become commercial because of the organization of the system. (Merton called this “pseudo-gemeinschaft,” Habermas called it “colonization of the lifeworld.”)
  2. Money is used as a proxy for “best” and it does not work. That is, those with the most money to spend can prevail over those with the most useful information. The creation of a salable audience takes priority over your authentic interests. (Smythe called this the “audience commodity,” it is Baker’s “market filter.”)
  3. Over time, if people are offered things that are not aligned with their interests often enough, they can be taught what to want. That is, they may come to wrongly believe that these are their authentic interests, and it may be difficult to see the world any other way. (Similar to Chomsky and Herman’s [not Lippman’s] arguments about “manufacturing consent.”)

There is nothing inherent in the technologies of algorithmic allocation that is doing this to us, instead the economic organization of the system is producing these pressures. In fact, we could design a system to support our authentic interests, but we would then need to fund it. (Thanks, late capitalism!)

To conclude, let’s get some historical perspective. What are the other options, anyway? If cultural selection is governed by computer algorithms now, you might answer, “who cares?” It’s always going to be governed somehow. If I said in a talk about “algorithmic culture” that I don’t like the Netflix recommender algorithm, what is supposed to replace it?

This all sounds pretty bad, so you might think I am asking for a return to “pre-algorithmic” culture: Let’s reanimate the corpse of Louis B. Mayer and he can decide what I watch. That doesn’t seem good either and I’m not recommending it. We’ve always had selection systems and we could even call some of the earlier ones “algorithms” if we want to.  However, we are constructing something new and largely unprecedented here and it isn’t ideal. It isn’t that I think algorithms are inherently dangerous, or bad — quite the contrary. To me this seems like a case of squandered potential.

With algorithmic culture, computers and algorithms are allowing a new level of real-time personalization and content selection on an individual basis that just wasn’t possible before. But rather than use these tools to serve our authentic interests, we have built a system that often serves a commercial interest that is often at odds with our interests — that’s corrupt personalization.

If I use the dominant forms of communication online today (Facebook, Google, Twitter, YouTube, etc.) I can expect content customized for others to use my name and my words without my consent, in ways I wouldn’t approve of. Content “personalized” for me includes material I don’t want, and obscures material that I do want. And it does so in a way that I may not be aware of.

This isn’t an abstract problem like a long-term threat to democracy, it’s more like a mugging — or at least a confidence game or a fraud. It’s violence being done to you right now, under your nose. Just click “like.”

In answer to your question, dear student, that’s my first danger.

* * *

ADDENDUM:

This blog post is already too long, but here is a TL;DR addendum for people who already know about all this stuff.

I’m calling this corrupt personalization because I cant just apply Baker’s excellent ideas about corrupt segments — the world has changed since he wrote them. Although this post’s reasoning is an extension of Baker, it is not a straightforward extension.

Algorithmic attention is a big deal because we used to think about media and identity using categories, but the algorithms in wide use are not natively organized that way. Baker’s ideas were premised on the difference between authentic and inauthentic categories (“segments”), yet segments are just not that important anymoreBermejo calls this the era of post-demographics.

Advertisers used to group demographics together to make audiences comprehensible, but it may no longer be necessary to buy and sell demographics or categories as they are a crude proxy for purchasing behavior. If I want to sell a Subaru, why buy access to “Brite Lights, Li’l City” (My PRIZM marketing demographic from the 1990s) when I can directly detect “intent to purchase a station wagon” or “shopping for a Subaru right now”? This complicates Baker’s idea of authentic segments quite a bit. See also Gillespie’s concept of calculated publics.

Also Baker was writing in an era where content was inextricably linked to advertising because it was not feasible to decouple them. But today algorithmic attention sorting has often completely decoupled advertising from content. Online we see ads from networks that are based on user behavior over time, rather than what content the user is looking at right now. The relationship between advertising support and content is therefore more subtle than in the previous era, and this bears more investigation.

Okay, okay I’ll stop now.

* * *

(This is a cross-post from Multicast.)

Keeping Teens ‘Private’ on Facebook Won’t Protect Them

(Originally written for TIME Magazine)

We’re afraid of and afraid for teenagers. And nothing brings out this dualism more than discussions of how and when teens should be allowed to participate in public life.

Last week, Facebook made changes to teens’ content-sharing options. They introduced the opportunity for those ages 13 to 17 to share their updates and images with everyone and not just with their friends. Until this change, teens could not post their content publicly even though adults could. When minors select to make their content public, they are given a notice and a reminder in order to make it very clear to them that this material will be shared publicly. “Public” is never the default for teens; they must choose to make their content public, and they must affirm that this is what they intended at the point in which they choose to publish.

Representatives of parenting organizations have responded to this change negatively, arguing that this puts children more at risk. And even though the Pew Internet & American Life Project has found that teens are quite attentive to their privacy, and many other popular sites allow teens to post publicly (e.g. Twitter, YouTube, Tumblr), privacy advocates are arguing that Facebook’s decision to give teens choices suggests that the company is undermining teens’ privacy.

But why should youth not be allowed to participate in public life? Do paternalistic, age-specific technology barriers really protect or benefit teens?

One of the most crucial aspects of coming of age is learning how to navigate public life. The teenage years are precisely when people transition from being a child to being an adult. There is no magic serum that teens can drink on their 18th birthday to immediately mature and understand the world around them. Instead, adolescents must be exposed to — and allowed to participate in — public life while surrounded by adults who can help them navigate complex situations with grace. They must learn to be a part of society, and to do so, they must be allowed to participate.

Most teens no longer see Facebook as a private place. They befriend anyone they’ve ever met, from summer-camp pals to coaches at universities they wish to attend. Yet because Facebook doesn’t allow youth to contribute to public discourse through the site, there’s an assumption that the site is more private than it is. Facebook’s decision to allow teens to participate in public isn’t about suddenly exposing youth; it’s about giving them an option to treat the site as being as public as it often is in practice.

Rather than trying to protect teens from all fears and risks that we can imagine, let’s instead imagine ways of integrating them constructively into public life. The key to doing so is not to create technologies that reinforce limitations but to provide teens and parents with the mechanisms and information needed to make healthy decisions. Some young people may be ready to start navigating broad audiences at 13; others are not ready until they are much older. But it should not be up to technology companies to determine when teens are old enough to have their voices heard publicly. Parents should be allowed to work with their children to help them navigate public spaces as they see fit. And all of us should be working hard to inform our younger citizens about the responsibilities and challenges of being a part of public life. I commend Facebook for giving teens the option and working hard to inform them of the significance of their choices.

(Originally written for TIME Magazine)

eyes on the street or creepy surveillance?

This summer, with NSA scandal after NSA scandal, the public has (thankfully) started to wake up to issues of privacy, surveillance, and monitoring. We are living in a data world and there are serious questions to ask and contend with. But part of what makes this data world messy is that it’s not so easy as to say that all monitoring is always bad. Over the last week, I’ve been asked by a bunch of folks to comment on the report that a California school district hired an online monitoring firm to watch its students. This is a great example of a situation that is complicated.

The media coverage focuses on how the posts that they are monitoring are public, suggesting that this excuses their actions because “no privacy is violated.” We should all know by now that this is a terrible justification. Just because teens’ content is publicly accessible does not mean that it is intended for universal audiences nor does it mean that the onlooker understands what they see. (Alice Marwick and I discuss youth privacy dynamics in detail in “Social Privacy in Networked Publics”.) But I want to caution against jumping to the opposite conclusion because these cases aren’t as simple as they might seem.

Consider Tess’ story. In 2007, she and her friend killed her mother. The media reported it as “girl with MySpace kills mother” so I decided to investigate the case. For 1.5 years, she documented on a public MySpace her struggles with her mother’s alcoholism and abuse, her attempts to run away, her efforts to seek help. When I reached out to her friends after she was arrested, I learned that they had reported their concerns to the school but no one did anything. Later, I learned that the school didn’t investigate because MySpace was blocked on campus so they couldn’t see what she had posted. And although the school had notified social services out of concern, they didn’t have enough evidence to move forward. What became clear in this incident – and many others that I tracked – is that there are plenty of youth crying out for help online on a daily basis. Youth who could really benefit from the fact that their material is visible and someone is paying attention.

Many youth cry out for help through social media. Publicly, often very publicly. Sometimes for an intended audience. Sometimes as a call to the wind for anyone who might be paying attention. I’ve read far too many suicide notes and abuse stories to believe that privacy is the only frame viable here. One of the most heartbreaking was from a girl who was commercially sexually exploited by her middle class father. She had gone to her school who had helped her go to the police; the police refused to help. She published every detail on Twitter about exactly what he had done to her and all of the people who failed to help her. The next day she died by suicide.  In my research, I’ve run across too many troubled youth to count. I’ve spent many a long night trying to help teens I encounter connect with services that can help them.

So here’s the question that underlies any discussion of monitoring: how do we leverage the visibility of online content to see and hear youth in a healthy way? How do we use the technologies that we have to protect them rather than focusing on punishing them?  We shouldn’t ignore youth who are using social media to voice their pain in the hopes that someone who cares might stumble across their pleas.

Urban theorist Jane Jacobs used to argue that the safest societies are those where there are “eyes on the street.” What she meant by this was that healthy communities looked out for each other, were attentive to when others were hurting, and were generally present when things went haywire. How do we create eyes on the digital street? How do we do so in a way that’s not creepy?  When is proactive monitoring valuable for making a difference in teens’ lives?  How do we make sure that these same tools aren’t abused for more malicious purposes?

What matters is who is doing the looking and for what purposes. When the looking is done by police, the frame is punitive. But when the looking is done by caring, concerned, compassionate people – even authority figures like social workers – the outcome can be quite different. However well-intended, law enforcement’s role is to uphold the law and people perceive their presence as oppressive even when they’re trying to help. And, sadly, when law enforcement is involved, it’s all too likely that someone will find something wrong. And then we end up with the kinds of surveillance that punishes.

If there’s infrastructure put into place for people to look out for youth who are in deep trouble, I’m all for it. But the intention behind the looking matters the most. When you’re looking for kids who are in trouble in order to help them, you look for cries for help that are public. If you’re looking to punish, you’ll misinterpret content, take what’s intended to be private and publicly punish, and otherwise abuse youth in a new way.

Unfortunately, what worries me is that systems that are put into place to help often get used to punish. There is often a slippery slope where the designers and implementers never intended for it to be used that way. But once it’s there….

So here’s my question to you. How can we leverage technology to provide an additional safety net for youth who are struggling without causing undue harm? We need to create a society where people are willing to check in on each other without abusing the power of visibility. We need more eyes on the street in the Jacbos-ian sense, not in the surveillance state sense. Finding this balance won’t be easy but I think that it behooves us to not jump to extremes. So what’s the path forward?

(I discuss this issue in more detail in my upcoming book “It’s Complicated: The Social Lives of Networked Teens.”  You can pre-order the book now!)

Legal Portraits of Web Users

This Summer I became very interested in what I think I will be calling “legal portraits of digital subjects” or something similar. I came to this through doing a study on MOOCs with SMC this summer. The title of the project is “Students as End Users in the MOOC Ecology” (the talk is available online).  In the project I am looking at what the Big 3 MOOC companies are saying publicly about the “student” and “learner” role and comparing it to how the same subject is legally constituted to try to understand the cultural implications of turning students into “end users”.

As I was working through this project, and thinking of implications outside of MOOCs and Higher Ed, I realized these legal portraits are constantly being painted in digital environments. As users of the web/internet/digital tools we are constantly in the process of accepting various clickwrap  and browse-wrap agreements without thinking twice about it, because it has become a standard cultural practice.

In writing this post I’ve already entered numerous binding legal agreements. Here are some of the institutions that have terms I am to follow:

  1. Internet Service Provider

  2. Web Browser

  3. Document Hosting Service (I wrote this in the cloud somewhere else first)

  4. Blog Hosting Company

  5. Blog Platform

  6. Various Companies I’ve Accepted Cookies From

  7. Social Media Sites

I’ve gone through and read some of the Terms (some of them I cannot find). I’ve allowed for the licensing and reproduction of this work in multiple places without even thinking twice about it.  We talk a lot about privacy concerns.  We know that by producing things like blog post, or status updates we are agreeing to being surveilled to various degrees.  I’d love to start a broader conversation on the effects of agreeing to a multitude of Terms though, not just privacy, simply by logging on and opening a browser.

Thoughts on the engagement of 6 million Facebook users

June 21, 2013 Facebook reported that a bug had potentially exposed 6 million Facebook users’ contact details. While this security breach is a huge at any scale and raises concerns regarding online privacy what I want to bring forward is that it also illuminates how our data is currently used by social media sites. In fact, it is quite interesting that instead of technical description of what happened Facebook wants to tell us why and how it happened:

When people upload their contact lists or address books to Facebook, we try to match that data with the contact information of other people on Facebook in order to generate friend recommendations. For example, we don’t want to recommend that people invite contacts to join Facebook if those contacts are already on Facebook; instead, we want to recommend that they invite those contacts to be their friends on Facebook.

Because of the bug, some of the information used to make friend recommendations and reduce the number of invitations we send was inadvertently stored in association with people’s contact information as part of their account on Facebook. As a result, if a person went to download an archive of their Facebook account through our Download Your Information (DYI) tool, they may have been provided with additional email addresses or telephone numbers for their contacts or people with whom they have some connection. This contact information was provided by other people on Facebook and was not necessarily accurate, but was inadvertently included with the contacts of the person using the DYI tool.

The point I want to focus on here is that in response to the security breach Facebook gives us a rather rare view of how they use user information to establish and maintain user engagement. What is important in this regard is the notion that users’ ‘contact lists’ and ‘address books’ are not only stored to the server but also actively used by Facebook to build new connections and establish new attachments. In this very case your contact details are used to make friend recommendations.
fbsmc

According to Mark Coté and Jennifer Pybus (2007, 101) social networks have an inbuilt “architecture of participation.” This architecture invites users to use the site and then exploits the data user submits to intensify the personalized user experiences. Friend recommendation system is without a doubt a part of these architectures. It is based on the idea that you do not connect with random people but with the people you know. You do not need to search for these people, Facebook suggests them for you with its algorithmic procedures (Bucher 2012). Your real life acquaintances become your Friends on Facebook and you do not have to leave the site to maintain these relationships.

To paraphrase José van Dijck (2013, 12 n9) social media sites engineer our sociality: in other words social media sites are “trying to exert influence on or directing user behavior.” Engineering of sociality needs not to refer to political propaganda or ideological brainwash but can as well be interpreted as technology of keeping users engaged with social media sites. Facebook of course needs user engagement in order to remain productive and to be credible for its shareholders. To be clear, user engagement here is not only emotional or psychological relation to a social media site but a relation that is in extensive manner coded and programmed to the technical and social uses of the platform itself. As such it needs to be researched from views that take into account both human and non-human agencies.

In short, being engaged with social media is a relation of connecting and sharing, discovering and learning, expressing oneself. These architectures of participation work in a circular logic. The more information you provide to social media sites, either explicitly or implicitly (see Schäfer 2009), the more engaged you become. Not only because these sites are able to better place you to a demographic slot based on big data but also because they use the small data, your private data, to personalize the experience. Eventually, you are so engaged that things like compromising the privacy of 6 million users does not stop you from using these sites.

References

Bucher, Taina 2012. “The Friendship Assemblage: Investigating Programmed Sociality on Facebook.” Television & New Media.Published Online August 24.

Coté, Mark & Pybus, Jennifer 2007. “Learning to Immaterial Labour 2.0: MySpace and Social Networks.” Ephemera, Vol 7(1): 88-106.

Schäfer, Mirko Tobias 2011. Bastard Culture! How User Participation Transforms Cultural Production. Amsterdam: Amsterdam University Press.

Van Dijck, José 2013. The Culture of Connectivity: A Critical History of Social Media. Oxford & New York: Oxford University Press.

Big Data Thoughts

MIT Firehose
MIT Firehose, via wallg on flickr

401 Access Denied , 403 Forbidden , 404 Not Found , 500 Internal Server Error & the Firehose

There is this thing called the firehose. I’ve witnessed mathematicians, game theorists, computer scientist and engineers (apparently there is a distinction), economists, business scholars, and social scientist salivate over it (myself included). The Firehouse, though technically reserved for the twitter API, is all encompassing in the realm of social science for the streams of data that come from social networking sites that are so large that they cannot be processed as they come in. The data are so large, in fact, that coding requires multiple levels of computer aided refinement, as though when we take data from these sources we are drinking from a firehose. While I cannot find the etymology of where the term came from, it seems it either came from twitter terminology bleed, or a water fountain at MIT.

I am blessed with an advisor who has become the little voice that I always have at the back of my head when I am thinking about something. Every meeting he asks the same question, one that should be easy to answer but almost never is, especially when we are invested in a topic, “why does this matter?” To date, outside of business uses or artistic exploration we’ve not made a good case for why big data matters. I think we all want it because we think some hidden truth might be within it. We fetishize big data, and the Firehouse that exists behind locked doors, as though it will be the answer to some bigger question. The problem with this is, there is no question. We, from our own unique, biased, and disciplinary homes, have to come up with the bigger questions. We also have to accept that while data might provide us with some answers, perhaps we should be asking questions that go deeper than that in a research practice that requires more reflexivity than we are seeing right now. I would love to see more nuanced readings that acknowledge the biases, gaps, and holes at all levels of big data curation.

Predictive Power of Patterns

One of my favorite anecdotes that shows the power of big data is the Target incident from February 2012. Target predicted a teenage girl was pregnant and acted as such before she told her family. They sent baby centric coupons to her. Her father called Target very angry then called back later to apologize because there were some things his daughter hadn’t told him. The media storm following the event painted a world both in awe and creeped out by Targets predictive power. How could a seemingly random bit of shopping history point to a pattern that showed that a customer was pregnant? How come I hadn’t noticed that they were doing this to me too? Since the incident went public, and Target shared how they learned how to hide the targeted ads and coupons to minimize the creepy factor I’ve enjoyed receiving the Target coupon books that always come in pairs to my home, one for me and one for my husband, that look the same on the surface but have slight variations on the inside. Apparently target has learned that it the coupons for me go to him they will be used. This is because every time I get my coupon books I complain to him about my crappy coupon for something I need. He laughs at me and shows me his coupon, usually worth twice as much as mine if I just spend a little bit more. It almost always works.

In 2004 Lou Agosta wrote a piece titled “The Future of Data Mining- Predictive Analytics”. With the proliferation of social media, API data access, and the beloved yet mysterious firehose, I think we can say the future is now. Our belief and cyclical relationship with progress as a universal future inevitability turns big data into a universal good. While I am not denying the usefulness of finding predictive patterns, clearly Target knew the girl was pregnant and was able to capitalize on that knowledge, for the social scientist, this pattern identification for outcome prediction followed by verification should not be enough.  Part of our fetishization of big data seems to be in the idea that somehow it will allow us to not just anticipate, but to know, the future. Researchers across fields and industries are working on ways to extract meaningful, predictive data from these nearly indigestible datastreams. We have to remember that even in big data there are gaps, holes, and disturbances. Rather than looking at what big data can tell us, we should be looking towards it as an exploratory method that can help us define different problem sets and related questions.

Big Data as Method?

Recently I went to a talk by a pair of computer scientists. There were people speaking who had access to the entire database of Wikipedia. Because they could, they decided to visualize Wikipedia. After going through slide after slide of pretty colors, they said “who knew there were rainbows in Wikipedia!?”, and then announced that they had moved on from that research. Rainbows can only get me so far. I was stuck asking why this pattern kept repeating itself and wanting to know how people who were creating the data that turned into a rainbow imagined what they were producing. The visualizations didn’t answer anything. If anything, they allowed me to ask clearer, more directed questions. This isn’t to say the work that they did wasn’t beautiful. It is and was. But there is so much more work to do. I hope that as big data continues to become something of a social norm that more people begin to speak across the lines so that we learn how to use this data in meaningful ways everywhere. Right now I think that visualization is still central, but that is one of my biases. The reason I think this is the case because it allows for simple identification of patterns. It also allows us to take in petabytes of data at once, compare different datasets (if similar visualization methods are used) and, to experiment in a way that other forms of data representation do not. When people share visualizations they either show their understandable failure or the final polished product meant for mass consumption. I’ve not heard a lot of conversation about using big data, its curation, and visualization generation as/and method, but maybe I’m not in the right circles? Still, I think until we are willing to share the various steps along the way to turning big data into meaningful bits, or we create an easy to use toolkit for the next generation of big data visualizations, we will continue to all be hacking at the same problem, ending and stopping at different points, without coming to a meaningful point other than “isn’t big data beautiful?”

Challenges for Health in a Networked Society

In February, I had the great fortune to visit the Robert Wood Johnson Foundation as part of their “What’s Next Health” series. I gave a talk raising a series of critical questions for those working on health issues. The folks at RWJF have posted my talk, along with an infographic of some of the challenges I see coming down the pipeline.

They also asked me to write a brief blog post introducing some of my ideas, based on one of the questions that I asked in the lecture. I’ve reposted it here, but if this interests you, you should really go check out the talk over at RWJF’s page.

….

RWJF’s What’s Next Health: Who Do We Trust?

We live in a society that is more networked than our grandparents could ever have imagined. More people have information at their fingertips than ever before. It’s easy to see all of this potential and celebrate the awe-some power of the internet. But as we think about the intersection of technology and society, there are so many open questions and challenging conundrums without clear answers. One of the most pressing issues has to do with trust, particularly as people turn to the internet and social media as a source of health information. We are watching shifts in how people acquire information. But who do they trust? And is trust shifting?

Consider the recent American presidential election, which is snarkily referred to as “post-factual.” The presidential candidates spoke past one another, refusing to be pinned down. News agencies went into overdrive to fact-check each statement made by each candidate, but the process became so absurd that folks mostly just gave up trying to get clarity. Instead, they focused on more fleeting issues like whether or not they trusted the candidates.

In a world where information is flowing fast and furious, many experience aspects of this dynamic all the time. People turn to their friends for information because they do not trust what’s available online. I’ve interviewed teenagers who, thanks to conversations with their peers and abstinence-only education, genuinely believe that if they didn’t get pregnant the last time they had sex, they won’t get pregnant this time. There’s so much reproductive health information available online, but youth turn to their friends for advice because they trust those “facts” more.

The internet introduces the challenges of credibility but it also highlights the consequences of living in a world of information overload, where the issue isn’t whether or not the fact is out there and available, but how much effort a person must go through to manage making sense of so much information. Why should someone trust a source on the internet if they don’t have the tools to assess the content’s credibility? It’s often easier to turn to friends or ask acquaintances on Facebook for suggestions. People use the “lazy web” because friends are more likely to respond quickly and make sense than trying to sort out what’s available through Google.

As we look to the future, organizations that focus on the big issues — like the Robert Wood Johnson Foundation — need to think about what it means to create informed people in a digital era. How do we spread accurate information through networks? How do we get people to trust abstract entities that have no personal role in their lives?”

Questions around internet and trust are important: What people know and believe will drive what they do and this will shape their health.

The beauty of this moment, with so many open questions and challenges, is that we are in a position to help shape the future by delicately navigating these complex issues. Thus, we must be asking ourselves: How can we collectively account for different stakeholders and empower people to make the world a better place?

thoughts on Pew’s latest report: notable findings on race and privacy

Yesterday, Pew Internet and American Life Project (in collaboration with Berkman) unveiled a brilliant report about “Teens, Social Media, and Privacy.” As a researcher who’s been in the trenches on these topics for a long time now, none of their finding surprised me but it still gives me absolute delight when our data is so beautifully in synch. I want to quickly discuss two important issues that this report raise.

Race is a factor in explaining differences in teen social media use.

Pew provides important measures on shifts in social media, including the continued saturation of Facebook, the decline of MySpace, and the rise of other social media sites (e.g., Twitter, Instagram). When they drill down on race, they find notable differences in adoption. For example, they highlight data that is the source of “black Twitter” narratives: 39% of African-American teens use Twitter compared to 23% of white teens.

Most of the report is dedicated to the increase in teen sharing, but once again, we start to see some race differences. For example, 95% of white social media-using teens share their “real name” on at least one service while 77% of African-American teens do. And while 39% of African-American teens on social media say that they post fake information, only 21% of white teens say they do this.

Teens’ practices on social media also differ by race. For example, on Facebook, 48% of African-American teens befriend celebrities, athletes, or musicians while one 25% of white teen users do.

While media and policy discussions of teens tend to narrate them as an homogenous group, there are serious and significant differences in practices and attitudes among teens. Race is not the only factor, but it is a factor. And Pew’s data on the differences across race highlight this.

Of course, race isn’t actually what’s driving what we see as race differences. The world in which teens live is segregated and shaped by race. Teens are more likely to interact with people of the same race and their norms, practices, and values are shaped by the people around them. So what we’re actually seeing is a manifestation of network effects. And the differences in the Pew report point to black youth’s increased interest in being a part of public life, their heightened distrust of those who hold power over them, and their notable appreciation for pop culture. These differences are by no means new, but what we’re seeing is that social media is reflecting back at us cultural differences shaped by race that are pervasive across America.

Teens are sharing a lot of content, but they’re also quite savvy.

Pew’s report shows an increase in teens’ willingness to share all sorts of demographic, contact, and location data. This is precisely the data that makes privacy advocates anxious. At the same time, their data show that teens are well-aware of privacy settings and have changed the defaults even if they don’t choose to manage the accessibility of each content piece they share. They’re also deleting friends (74%), deleting previous posts (59%), blocking people (58%), deleting comments (53%), detagging themselves (45%), and providing fake info (26%).

My favorite finding of Pew’s is that 58% of teens cloak their messages either through inside jokes or other obscure references, with more older teens (62%) engaging in this practice than younger teens (46%). This is the practice that I’ve seen significantly rise since I first started doing work on teens’ engagement with social media. It’s the source of what Alice Marwick and I describe as “social steganography” in our paper on teen privacy practices.

While adults are often anxious about shared data that might be used by government agencies, advertisers, or evil older men, teens are much more attentive to those who hold immediate power over them – parents, teachers, college admissions officers, army recruiters, etc. To adults, services like Facebook that may seem “private” because you can use privacy tools, but they don’t feel that way to youth who feel like their privacy is invaded on a daily basis. (This, btw, is part of why teens feel like Twitter is more intimate than Facebook. And why you see data like Pew’s that show that teens on Facebook have, on average 300 friends while, on Twitter, they have 79 friends.) Most teens aren’t worried about strangers; they’re worried about getting in trouble.

Over the last few years, I’ve watched as teens have given up on controlling access to content. It’s too hard, too frustrating, and technology simply can’t fix the power issues. Instead, what they’ve been doing is focusing on controlling access to meaning. A comment might look like it means one thing, when in fact it means something quite different. By cloaking their accessible content, teens reclaim power over those who they know who are surveilling them. This practice is still only really emerging en masse, so I was delighted that Pew could put numbers to it. I should note that, as Instagram grows, I’m seeing more and more of this. A picture of a donut may not be about a donut. While adults worry about how teens’ demographic data might be used, teens are becoming much more savvy at finding ways to encode their content and achieve privacy in public.

Anyhow, I have much more to say about Pew’s awesome report, but I wanted to provide a few thoughts and invite y’all to read it. If there is data that you’re curious about or would love me to analyze more explicitly, leave a comment or drop me a note. I’m happy to dive in more deeply on their findings.

Participatory Culture: What questions do you have?

Question Mark GraffitiHenry Jenkins, Mimi Ito, and I have embarked on an interesting project for Polity. Through a series of dialogues, we’re hoping to produce a book that interrogates our different thoughts regarding participatory culture. The goal is to unpack our differences and agreements and identify some of the challenges that we see going forward. We began our dialogue this week and had a serious brain jam where we interrogated our own assumptions, values, and stakes in doing the research that we each do and thinking about the project of participatory culture more generally. For the next three weeks, we’re going to individually reflect before coming back to begin another wave of deep dialoguing in the hopes that the output might be something that others (?you?) might be interested in reading.

And here’s where we’re hoping that some of our fans and critics might be willing to provoke us to think more deeply.

  • What questions do you have regarding participatory culture that you would hope that we would address?
  • What criticisms of our work would you like to offer for us to reflect on?
  • What do you think that we fail to address in our work that you wish we would consider?

For those who are less familiar with this concept, Henry and his colleagues describe a “participatory culture” as one:

  1. With relatively low barriers to artistic expression and civic engagement
  2. With strong support for creating and sharing one’s creations with others
  3. With some type of informal mentorship whereby what is known by the most experienced is passed along to novices
  4. Where members believe that their contributions matter
  5. Where members feel some degree of social connection with one another (at the least they care what other people think about what they have created).

This often gets understood through the lens of “Web2.0” or “user-generated content,” but this is broadly about the ways in which a networked society rich with media enables new forms of interaction and engagement. Some of the topics that we are considering covering include “new media literacies,” “participation gap” and the digital divide, the privatization of culture, and networked political engagement. And, needless to say, a lot of our discussion will center on young people’s activities and the kinds of learning and social practices that take place. So what do *you* want us to talk about?

The Problem with Crowdsourcing Crime Reporting

There has been some excitement about the idea of using technology to address the problems of the Mexican Drug War. As someone involved in technology, I find it inspiring that other techies are trying to do something to end the conflict. However, I also worry when I read ideas based on flawed assumptions. For example, the assumption that “good guys” just need a safe way to report the “bad guys” to the cops reduces the Mexican reality to a kid’s story, where lines are easily and neatly drawn.

So, here are a few reasons why building tools to enable citizens to report crime in Mexico is problematic and even dangerous.

  1. Anonymity does not depend only on encryption. Criminals do not need to rely on advanced crypto-techniques when information itself is enough to figure out who leaked it. Similar ideas are being discussed by researchers trying to figure out how to identiy future Wikileaks-like collaborators, something they call Fog Computing. The point is, the social dynamics around the Drug War in Mexico mean that people are exposed when they post something local. In an era of big data, it’s easy to piece things together, even if the source is encrypted. And, sadly, when terror is your business, getting it wrong doesn’t matter as much.
  2. Criminal organizations, law enforcement, and even citizens are not independent entities. Organized crime has co-opted individuals, from the highest levels of government down to average citizens working with them on the side– often referred to as “halcones.”
  3. Apprehensions do not lead to convictions. According to some data, “78% of crimes go unreported in Mexico, and less than 1% actually result in convictions.” Mexico is among those countries with the highest indices of impunity, even with high-profile cases such as the murder of journalists.  All this is partly because of high levels of corruption.
  4. Criminal organizations have already discovered how to manipulate law enforcement against their opponents–there is even a term for it: “calentar la plaza“– the sudden increase of extreme violence in locations controlled by the opposite group, with the sole purpose of catching the attention of the military, which eventually takes over, and weakens the enemy.

The failure of crowdsourcing became evident only a few weeks ago with a presidential election apparently plagued with irregularities. Citizens actively crowdsourced reports of electoral fraud and subsequently uploaded the evidence to YouTube, Twitter, and Facebook. Regardless of whether those incidents would affect the final result of the election, the institutions in charge seem to have largely ignored the reports. One can only imagine what would happen with the report of highly profitable crimes like drug trafficking.

Crowdsourcing is not entirely flawed in the Mexican context, though. We have seen people in various Mexican cities organize organically to alert one another of violent events, in real time. But these urban crisis management networks do not need institutions to function. However, law enforcement does, unless one is willing to accept lynching and other types of crowd-based law enforcement.

In sum, as Damien Cave mentioned, what Mexico needs is institutions, and the people willing to change the culture of impunity. Technologies that support this kind of change would be more effective than those imagined with a “first world” mindset.

Thanks to danah boyd for helping me think through some of these ideas.