The Curious Case of EBook Sharing Sites

The popularity of ebooks has skyrocketed in the last few years. The Association of American Publishers reports that eBook sales by US publishers were up 300% in 2011:

Total eBook net sales revenue for 2011 was $21.5 million, a gain of 332.6% over 2010; this represents 3.4 million eBook units sold in 2011, up 303.3 %. As comparison, print formats (Hardcover, Paperback and Mass Market Paperback) increased 2.3% to $335.9 million in 2011.

(Source) With this increase has come the usual hand-wringing over the end of print, the effects on book stores, access to books for people who can’t afford e-readers, the problems caused by DRM and the demise of the First-sale Doctrine (which says you can sell second-hand books, DVDs, videos, etc.), and so forth.

These are all worth investigation, but I’ve become interested in two specific effects of this shift.

First, the enormous rise in erotica sales and the ability of unknown authors without agents or publishers to publish ebooks cheaply and easily.

Second, the ebook sharing underground: a loose network of sites that let people swap ebooks without DRM. Because the files are so small, they’re much easier to disseminate than movies or television shows. They can be easily emailed, DropBoxed, or placed on a DDL (direct download) file-sharing server like 4Shared or Rapidshare. (There are also ebooks on BitTorrent, but it seems that most ebook sharers bypass the torrent infrastructure entirely, probably due to usability concerns or lack of comfort with the protocol.) The popular freeware program Calibre allows ebook users to convert any format (pdf, epub, mobi) to any other format; there’s a popular Calibre plugin that cracks DRM. Most ebook sharing sites contain a tutorial or two on using Calibre.

While all sorts of books are shared online, many of the ebook sharing sites I’ve come across are largely comprised of romance novels. Romance novels are an enormous industry, comprising 13% of the US market and generating more revenue than any other category:

Romance fiction: $1.358 billion in estimated revenue for 2010
Religion/inspirational: $759 million
Mystery: $682 million
Science fiction/fantasy: $559 million
Classic literary fiction: $455 million
[Source: Romance Writers of America]

From my highly unscientific perusing of ebook sharing websites, the majority of participants are women, and most of them are voracious consumers of particular subgenres, such as paranormal or Western. They’re aware of release dates — romances are published on a strict schedule— and so there’s a constant stream of new content being made available. Romances have become so popular on ebook sharing sites that one disgruntled participant wrote:

“The books board seems flooded by self published chic lit. More and more the forums are flooded by garbage that IMHO nobody would ever want to read. All about women having sex with dead people (vampires) or animals (werewolf). Is there some way we could divide the releases into, written by self publicist women, and normal literature. Seems about a 20:1 ratio in favour of the rubbish at the moment.”

(More on this in a second)

Another genre that’s been intensely impacted by file-sharing and technology is academic books. As most of us know, academic books can be ridiculously expensive, often priced for library acquisitions rather than personal purchasing. And most academic authors can expect limited sales and even more limited royalties. The ebook sites that aren’t flooded with romances are full of textbooks and academic books; specialized archives have sprung up for literary criticism, computer science textbooks, and cultural studies, often maintained and organized by graduate students or, I suspect, faculty members. The files shared therein are less likely to be cracked .mobi or .epub files and more likely to be scanned PDFs without OCR (optical character recognition, which allows you to search or cut and paste in PDFs). Given that many professors disseminate class readings as PDFs, it’s unsurprising that these are turning up online. The academic archives are full of students from countries without robust libraries, independent scholars cut off from academic library access, and broke graduate students who can’t afford to spend $50 on a 200-page monograph.

What sites like these display are needs that are not being met by the market. Digital books can be disseminated anywhere, easily, for free. Imagine a library where you never had to wait for a hold because you could just check out an infinitely-replicable digital version. But the way most ebooks are published now is as damaged goods. Due to DRM and publishing restrictions, you can’t easily trade ebooks (ok, you can trade a Kindle ebook once), buy them from yard sales, take them out from the library (OK, you can, but usually not new titles and usually in very limited numbers), borrow them from friends, or read them most of them for free. By circumventing DRM and circulating ebooks through underground, non-commercial sites, users are taking advantage of the possibilities of digital publishing that the publishers are trying to snuff out.

Beyond the general and obvious disruptive potential of ebooks, I’m fascinated by the wide-reaching, and often unexpected, effects of these changes.

This brings me to the other development: the rise in self-publishing, erotica, and self-published erotica. Obviously, 50 Shades of Grey is the exemplar here. For those of you who don’t keep up with zeitgeisty bestsellers, “Fifty Shades” is a three-volume series of BDSM romance, which started life as Twilight fan fiction. It’s sold 10 million copies, primarily to women, and primarily through ebook sales. It’s been on the New York Times bestseller list for months. And it’s often dismissively referred to as “mommy porn” for “bored Long Island housewives.” The Atlantic called it “terrible” and bemoaned “Can’t America ever like something quality? Are we just heading toward the dumbing down of everything?”

Far from it for me to defend the quality of the writing, but there’s something interesting here. Porn for men isn’t called “daddy porn,” it’s just called “porn.” A friend who got laid off from her job writing SEO articles has turned to writing Kindle Singles; among her erotica writers group, one woman is making $10,00 a month selling self-published erotica. Cutting out the middleman of publishers, even prolific publishers like Harlequin, has opened heretofore ignored markets. And one enormous market is clearly erotica written by and for women. Who are not only buying ebooks, but cracking the DRM on them and sharing them with friends.

Clearly, the impact of ebooks goes beyond the publishing industry. I’m fascinated to see where else in the culture we’ll see changes in reading practices. Beyond “Reading the Romance,” what can these ebook sharing communities teach us about audience and reception? In some ways, these sites, and sites like GoodReads, constitute interpretive communities, where uploaders recommend books and previously-ignored titles can spread like wildfire based on positive reviews. (I haven’t even touched on how social media is changing the relationship between readers and authors. Let’s just say I’ve vowed to be kinder in my GoodReads reviews.) Studies of filesharing are often focused on economics or legal aspects; it’s interesting to imagine the perspectives we might gain by leveraging audience studies and media and cultural studies in our analyses instead.

News and Updates – June 2012

Welcome to a new semi-regular feature where I update what various SMC members have been up to lately (think of this like the class notes in your alumni magazine, without the weddings and with less babies).

Kate Crawford joined SMC as a Principal Researcher in February. She has a number of new papers out:

  • danah boyd and Kate Crawford (2012) “Critical Questions for Big Data”, Information, Communication & Society, 15:5, pp 662 – 679.
    Revised version of the Big Data article Kate & danah wrote at MSR last year and delivered at the OII conference. This is a must-read for those of you working with big datasets.
    [Free access]
  • Kate Crawford (2012) “Four ways of Listening to an iPhone: From Sound and Network Listening to Biometric Data and Geolocative Tracking”, in Studying Mobile Media: Cultural Technologies, Mobile Communication and the iPhone, edited by Larissa Hjorth, Ingrid Richardson and Jean Burgess. London and New York: Routledge, pp 213 – 239.
    While we don’t have a download available, ABC national radio in Australia recently broadcast a special feature story & interview on Kate’s essay. You can hear it here.
  • Kath Albury and Kate Crawford (2012) “Sexting, Consent and Young People’s Ethics: Beyond ‘Megan’s Story'”, Continuum, 26: 3, pp 463 – 473.
    Kate says: “Article on sexting and consent that we first wrote in 2010 – a giant relief to see it finally appear in print!”
    [paywall access]

Woot!

Continuing the big data theme, Helen Nissenbaum has a new article as well as a new pamphlet out co-written with Kazys Varnelis that sounds awesome and is FREE!

  • Helen Nissenbaum and Kazys Varnelis, Modulated Cities: Networked Spaces, Reconstituted Subjects, Situated Technologies Book Series. [Free PDF download]

    In Situated Technologies Pamphlets 9, Helen Nissenbaum and Kazys Varnelis initiate a redefinition of privacy in the age of big data and networked, geo-spatial environments. Digital technologies permeate our lives and make the walls of the built environment increasingly porous, no longer the hard boundary they once were when it comes to decisions about privacy. Data profiling, aggregation, analysis, and sharing are broad and hidden, making it harder than ever to constrain the flow of data about us. Cautioning that suffocating surveillance could lead to paralyzed dullness, Nissenbaum and Varnelis do not ask us to retreat from digital media but advance interventions like protest, policy changes, and re-design as possible counter-strategies.

  • H. Nissenbaum, “From Preemption to Circumvention: If Technology Regulates Why Do We Need Regulation (and Vice Versa)?” Berkeley Technology Law Journal, 26:3
    “My attention will mostly be drawn to the role of law and regulation in circumstances where regulation by technology seems already to be in place, or, put another way, where regulation is already encoded in architecture.”
    [Free PDF]

danah boyd has a ton of talks coming up, published the Big Data paper with Kate, and has this upcoming paper:

  • Ybarra, Michele, danah boyd, Josephine Korchmaros, and Jay Koby Oppenheim. (In press; Available online) “Defining and Measuring Cyberbullying Within the Larger Context of Bullying Victimization,” Journal of Adolescent Health.
    [paywall access]

    Measures of bullying among English-speaking individuals in the United States should include the word “bully” when possible. The definition may be a useful tool for researchers, but results suggest that it does not necessarily yield a more rigorous measure of bullying victimization. Directly measuring aspects of bullying (i.e., differential power, repetition, over time) reduces misclassification. To prevent double counting across domains, we suggest the following distinctions: mode (e.g., online, in-person), type (e.g., verbal, relational), and environment (e.g., school, home). We conceptualize cyberbullying as bullying communicated through the online mode.

Our friend and frequent visitor Tarleton Gillespie of Cornell has a full-length piece about the Twitter Trends argument he developed here on SMC and at Culture Digitally:

  • Gillespie, Tarleton. “Can an Algorithm Be Wrong?” Limn (v2, 2012). [free access]

You should really check out the new issue of Limn, a academic-ish art-ish journal. This issue is on Clouds and Crowds and features great work from a variety of social media scholars including Biella Coleman (McGill) on Anonymous and Lilly Irani (UC Irvine) on Mechanical Turk.

My fellow outgoing postdoc Mike Ananny published a piece with Dan Kreiss (UNC) called “Journalism For and By the Public: Creating a Free Press” for the National Communication Association’s “Communication Currents” site. Mike also appeared at the Berkman Center, where he gave a lunchtime talk called A Public Right to Hear and Press Freedom in an Age of Networked Journalism. Full video is up on the Berkman site.

Nancy Baym is joining us soon (we’re very excited). She also has a new article out (we’re so productive!):

  • Jeff Hall & Nancy Baym (2012) Calling and Texting (too much): Mobile maintenance expectations, (over)dependence, entrapment, and friendship satisfaction. New Media & Society, 14(2), 316-331.
    [paywall link]
  • This article uses dialectical theory to examine how mobile phone use in close friendships affects relational expectations, the experiences of dependence, overdependence, and entrapment, and how those experiences affect relational satisfaction. Results suggest that increased mobile phone use for the purpose of relational maintenance has contradictory consequences for close friendships. Using mobile phones in close relationships increased expectations of relationship maintenance through mobile phones. Increased mobile maintenance expectations positively predicted dependence, which increased satisfaction, and positively predicted overdependence, which decreased satisfaction. Additionally, entrapment, the guilt and pressure to respond to mobile phone contact, uniquely predicted dissatisfaction. The results are interpreted in relation to the interdependent dialectical tensions of friendship, media entrapment, and the logic of perpetual contact.

Nancy also did a Berkman Center podcast in April, where she interviewed three musicians- Kristin Hersh, Zöe Keating, and Erin McKeown- about using community supported agriculture as a metaphor for rethinking music. [free access]

Finally, I (Alice Marwick) have a piece in Surveillance and Society about to drop (tell all your friends!) and wrote a essay for the Daily Beast with danah about teen social media use (spoiler: it’s not that weird).

Check the Events page for upcoming talks, and the new Video page for multimedia from past events.

Teens Text More than Adults, But They’re Still Just Teens

danah and I have a new piece in the Daily Beast. Summary: the more things change, the more they stay the same.

In the last decade, we’ve studied how technology affects how teens socialize, how they present themselves, and how they think about issues like gender and privacy. While it’s true that teens incorporate social media into many facets of their lives, and that they face new pressures their parents didn’t—from cyber-bullying to fearmongering over “online predators”—the core elements of high-school life are fundamentally the same today as they were two decades ago: friends, relationships, grades, family, and the future.

Read the full piece here.

A lot of the research that we do involving teenagers seems obvious to teenagers themselves. “Duh.” “Why would anyone study that?” “Who cares?”

Unfortunately, teenagers aren’t the ones writing news stories about how Facebook is making us lonely, Facebook is full of creepers, or teens are pressured to reveal intimate details on Facebook (note: those last two studies sponsored by a company that creates parental blocking and monitoring software). They aren’t the ones passing anti-bullying legislation, appearing on television to tell parents that teens study less and are more narcissistic than a generation ago, or implementing 3-strikes laws in public schools.

Our public-facing work aims to explain teenage practice in clear language that isn’t sensationalistic or fear-mongering. Obviously, not all scholarship lends itself to this type of writing. But given that social media is often discussed in utopian or dystopian terms in the press, research can provide a rational, sensible perspective that’s badly needed. Like, duh.

Is blocking pro-ED content the right way to solve eating disorders?

Warning: This post deals with eating disorder and self-harm content and is potentially triggering.

Following up on Tarleton’s terrific post on moderating Facebook comes Tumblr’s announcement that it will no longer allow pro-eating disorder (pro-ED) or pro-self-harm blogs on the site.

Active Promotion of Self-Harm. Don’t post content that actively promotes or glorifies self-injury or self-harm. This includes content that urges or encourages readers to cut or mutilate themselves; embrace anorexia, bulimia, or other eating disorders; or commit suicide rather than, e.g., seek counseling or treatment for depression or other disorders. Online dialogue about these acts and conditions is incredibly important; this prohibition is intended to reach only those blogs that cross the line into active promotion or glorification. For example, joking that you need to starve yourself after Thanksgiving or that you wanted to kill yourself after a humiliating date is fine, but recommending techniques for self-starvation or self-mutilation is not.

(The remainder of this post focuses on eating disorder content, because it’s what I know the most about. I’d love to hear more from people familiar with self-harm communities.)

Pro-ED content has existed on the internet for many years, and it has been studied by many researchers. It is primarily created and consumed by girls and young women, ages 13-25. There is evidence that the viewing of pro-ED websites (pro-ana, anorexia, and pro-mia, bulimia) produces negative effects in college-age women — lower self-esteem and perception of oneself as “heavier” (Bardone-Cone & Cass, 2007). But pro-ED websites have been sensationalized in the media as cults that encourage young women to kill themselves, even ending up as the case-of-the-week on Boston Legal.

At the same time, the cultural pressure on young women to conform to normative body types is intense. In Am I Thin Enough Yet? The Cult of Thinness and the Commercialization of Identity, feminist sociologist Sharlene Hess Biber looks at the complex interactions between media, schools, peers, family, and the health and fitness industry that systemically undermine young women’s self confidence, send the message that appearance is more important than intelligence or personality, and emphasize the importance of thinness overall. Often, the messages found on pro-ana or pro-mia sites– such as “nothing tastes as good as thin feels”, attributed to Kate Moss but actually a Weight Watchers slogan that has been around for decades– are extraordinarily similar to those found in magazines like Self and Women’s Health, or on websites like My Fitness Pal or Sparkpeople that promote weight loss in a “healthy” way. These media emphasize different weight loss techniques, but the message is the same: it is very important to be thin and conform to an attractive, normative body ideal.

Pro-ED websites are a female subculture, with their own vocabulary, customs, and norms. Moreover, the women who frequent these sites are well aware that their practices are stigmatized. In general, women with eating disorders go to great lengths to hide them from friends and families. This is primarily for two reasons: one, they want to keep losing weight and are worried that they may be forced into treatment, and two, they are afraid of being ridiculed or called out by others. The anonymous or pseudonymous nature of pro-ED sites allows these participants an outlet for their social isolation, and (to a certain extent) emotional support from others going through the same experiences that they are.

Jeannine Gailey, a sociologist of deviance, wrote a paper on pro-ED websites using ethnographic methods. She concludes:

They need a place where they can share their stories and fears with others who are similarly minded and have had comparable experiences. They, as Dias put it [another ethnographic researcher of pro-ana sites, paper here], are seeking a sanctuary. The internet provides the women with both a sanctuary and a medium in which to express the sensations and intense emotions they experience as they struggle to maintain control over their bodies and lives…. The women’s narratives I explored indicate that they participate in the central features of edgework, namely pushing oneself to the edge, testing the limits of both their bodies and minds, exercising particular skills that require ‘innate talent’ and mental toughness, and feelings of self-actualization or omnipotence.

Gailey frames EDs as “edgework,” a concept from criminology/deviance that describes practices of voluntary risk-taking, like skydiving, rock climbing, ‘extreme sports’, stock-trading, unprotected sex, and illegal graffiti. The skills Gailey describes as part of edgework are similar to those emphasized by other body-related extreme communities, such as those devoted to bodybuilding, crossfit, veganism, and paleo dieting. On such communities, members swap tips, ask for support, show progress, share information and share vocabularies and normative practices.

Obviously, Tumblr isn’t focusing on any of these communities. I’m not arguing that eating disorders aren’t dangerous, or even that they’re potentially empowering. They are not. But the focus on young women’s online practice as deviant, pathological, and quasi-illegal is in line with a long history. Young women and their bodies are often the locus of control of social panics, from teen pregnancies to virginity to obesity to dressing “slutty”.

More importantly, Tumblr banning this content won’t do anything to make it go away. It does take Tumblr off the hook, but even the quickest search for self-harm or thinspo (serious trigger warning) finds thousands of posts, many heartbreaking in their raw honesty. One Tumblr writes:

if tumblr blocks all our blogs then things will be worse. off than they were before, we’ll feel alone again, outcasts! Who can we share our problems with if our blogs have been taken off us? We share our deepest and most darkest secrets on here and if our blogs are taken where are we supposed to put our feelings? They will build up inside of us and things will get worse and worse. Well done tumblr you bunch of arseholes, you’re going to make things worse.

Pragmatically, many of the thinspo content has simply migrated to Pinterest. Others have password-protected their blogs and spread the password to people in the community.

Eating disorder prevention needs to be structural as well as medical. Realistically, eating disorders aren’t going anywhere as long as we have a complex set of mediated images and discursive tropes that pin the importance of young women on their bodies. These issues exist on a continuum that includes everything from Shape magazine and The Biggest Loser to well-meaning anti-childhood obesity initiatives. Young women participating in pro-ED communities are acting upon messages they get from many other places in their lives. While there is no agreed-upon way of dealing with pro-ED communities– and it’s great that Tumblr is going to implement PSA-type ads that appear on searches for these terms– there are more productive interventions that can be made. We must understand the reasons these young women are in such pain and, more importantly, be willing to engage with these communities, rather than painting them as horrific or abhorrent.

What the GPS Device on Antoine Jones’ Jeep Cherokee Means for Internet Privacy

Yesterday the Supreme Court ruled on United States vs. Jones [PDF of court opinion], a case in which the FBI/DC police placed a GPS tracking device on the Jeep Cherokee of Antoine Jones, a club owner in DC who was suspected of dealing cocaine. The cops tracked Mr. Jones for 28 days, and, based on that evidence (as well as a CCTV camera pointing at the club door, a pen register (*) and a wiretap on Jones’s cellphone), charged him with conspiracy and possession with intent. Jones appealed, saying that the GPS data should be inadmissible since it was collected without a warrant.

The Supreme Court held up the ruling of the DC Court of Appeals in a unanimous 9-0 decision, saying that a) this was a search b) a car is a person’s property, or “effects”, and thus affixing a GPS to the undercarriage of the car violates the Fourth Amendment. From the ruling:

It is important to be clear about what occurred in this case: The Government physically occupied private property for the purpose of obtaining information. We have no doubt that such a physical intrusion would have been considered a “search” within the meaning of the Fourth Amendment when it was adopted.

What’s interesting here is that there was a 5-4 split on why the Justices ruled as they did. Justice Sotomayor, writing a concurrent opinion, wrote, “When the Government physically invades personal property to gather information, a search occurs.The reaffirmation of that principle suffices to decide this case.” Since the government had invaded property, the Justices did not need to evaluate any of the other principles that this case brings up.

And there are many principles that this case brings up. Sotomayor talks about many of them: what about electronic surveillance if no property was trespassed upon? What about the chilling effects of potential long-term electronic surveillance? What about the fact that GPS monitoring gives far more specific information, and is far easier and cheaper, than traditional visual surveillance? What about the fact that this data can be stored and mined later? She writes:

I would take these attributes of GPS monitoring into account when considering the existence of a reasonable societal expectation of privacy in the sum of one’s public movements. I would ask whether people reasonably expect that their movements will be recorded and aggregated in a manner that enables the Government to ascertain, more or less at will, their political and religious beliefs, sexual habits, and so on. I do not regard as dispositive the fact that the Government might obtain the fruits of GPS monitoring through lawful conventional surveillance techniques… I would also consider the appropriateness of entrusting to the Executive, in the absence of any oversight from a coordinate branch, a tool so amenable to misuse, especially in light of the Fourth Amendment’s goa lto curb arbitrary exercises of police power to and prevent “a too permeating police surveillance.”

But most awesomely, Sotomayor then goes on to critique the third party doctrine. This says that if you disclose information to a third party (whether that’s your sister, Google, or Ma Bell), you have no reasonable expectation of privacy governing that information, and the government has a right to access it. As Sotomayor writes, “This approach is ill suited to the digital age, in which people reveal a great deal of information about themselves to third parties in the course of carrying out mundane tasks” like checking email, signing up for Facebook, or buying a pair of shoes online.

In a concurring opinion, four other judges agreed with the majority ruling, but not the use of the property doctrine to decide it. Instead, Alito, Ginsberg, Breyer and Kagan seem suspicious of electronic surveillance overall. In Alito’s concurring judgment, he mentions GPS, road CCTV cameras, electronic toll collectors, and, most interestingly, cell phone location data as potential invasions of privacy. He laments that Congress and state governments have done little or nothing to regulate the use of this data by law enforcement.

I think the SCOTUS is itching for a fight on digital privacy. I’m looking forward to seeing what happens with similar cases in the future.

* Don’t get me started on pen registers. They track what numbers you call, and have the technical capability to track where your cellphone is and even your text messages. Yet the standard for ordering one is much lower than, say, wiretapping; the potential surveillee just has to be part of an ‘ongoing criminal investigation.’ Even more worryingly, Chris Soghoian has documented that law enforcement makes tens of thousands of requests to phone companies for cell phone location information. Requests to internet companies for location information are not even subject to the pen register standard; all they need is a subpoena.

What’s the difference between SOPA and PIPA?

I decided to put my slightly-dormant internet policy research skillz to work to figure this out. It was surprisingly difficult. Most stop PIPA/SOPA websites conflate them– but they are different. (Note: The best resource was an article I found at Area 51 Technologies.)

#1: SOPA’s a House bill, PIPA is a Senate bill.
SOPA = House of Representatives
PIPA = Senate

The Senate tends to be older and more conservative than the House, meaning that it’s more likely to be completely clueless about the internet. That’s not good.

#2: PIPA has a greater chance of passing.

SOPA has gotten so much guff that it’s temporarily off the table. PIPA, on the other hand, has been relatively ignored and so is much farther along in the process.

#3: They are essentially the same “anti-piracy” bill, but with a few different provisions.

Both PIPA and SOPA focus on “foreign rogue websites” (e.g. the Pirate Bay, Wikileaks) that facilitate piracy. And they both establish systems for removing websites that the Department of Justice decides are “dedicated to infringing activities.”

PIPA does NOT have a provision that requires search engines to remove these “foreign infringing site[s]” from their indexes. SOPA does. And it’s been highly criticized.

PIPA does seem to require more court intervention to take down a site– that’s good, right?– but it DOESN’T have any provision that penalizes a copyright holder for making a false claim of infringement. In other words, a Big IP company can claim that a site is infringing, drag it through hella expensive litigation, be proven wrong, and the site can do nothing about its costs incurred in the process. SOPA DOES include a provision that penalizes copyright holders who do this “knowingly,” including making them liable for damages and legal costs.

#4: They both require DNS blocking.

Because this has been protested not only by civil rights groups and internet enthusiasts, but engineers and computer scientists who say that DNS blocking will damage internet infrastructure (like, say, the Domain Name System itself), the sponsors of SOPA and PIPA have agreed to strip this from both bills. They claim that this will eliminate much of the current opposition. (See related technical whitepaper. [PDF link])

The bills share many other odious traits, which are summarized by Katy Tasker from Public Knowledge in this handy chart:

Chart of the differences between PIPA and SOPA. The two bills are essentially the same.

Ultimately, PIPA and SOPA are not particularly different. They are slightly textually different versions of the same legislation– supported by the entertainment industry and, for the most part, heavily opposed by the technology industry (including us at SMC). If at this point you still haven’t called your senators or representatives, you can easily do so at americancensorship.org.

Using Off-the-shelf Software for basic Twitter Analysis

Mary Gray, Mike Ananny and I are writing a paper on queer youth and “Glee” for the American Anthropological Association’s annual meeting (yes, I have the greatest job in the world). This is a multi-methodological study by design, because traditional television viewing practices have become so complex. Besides traditional audience ethnography like interviews and participant observation, we are using textual analysis to analyze episode themes, and collected a large corpus of tweets with Glee-related hashtags. This summer, I worked with my high school intern, Jazmin Gonzales-Rivero, to go through this corpus of tweets and pull out useful information for the paper.

We’ve written and published a basic report on using off-the-shelf tools to see patterns and themes in large Twitter data set quickly and easily.

Abstract:

With the increasing popularity of large social software applications like Facebook and Twitter, social scientists and computer scientists have begun developing innovative approaches to dealing with the vast amounts of data produced and collected in such environments. For qualitative researchers, the methods involved can be daunting and unfamiliar. In this report, we outline some basic procedures for working with a large-scale Twitter data set to answer qualitative inquiries. We use Python, MySQL, and the word-cloud generator Wordle to identify patterns in re-tweets, tweet authors, dates and times of tweets, frequency of hashtags, and frequency of word use. Such data can provide valuable augmentation to qualitative inquiry. This paper is aimed at social scientists and humanities scholars with limited experience with big data and a lack of computing resources to do extensive quantitative research.

Citation:
Marwick, A. and Gonzales-Rivero, J. (2011). Learning to Work with Large-Scale Twitter Data Sets: Using Off-The-Shelf Tools to Quickly and Easily See Tweet Patterns. Microsoft Research Social Media Collective Report, MSR-SMC-11-01, Cambridge, MA. [Download as PDF]

If you’re a seasoned computer scientist or a Big Data aficionado, the information in this paper will seem quite simplistic. But for those of us without programming backgrounds who study Twitter or other forms of social media, the idea of tackling a set of 450,000 tweets can seem quite daunting. In this paper, Jazmin and I walk step-by-step through the methods she used to parse a set of Tweets, using free and easily accessible tools like MySQL, Python, and Wordle. We hope this will be helpful for other legal, humanities, and social science scholars who might want to dip their foot into Big Data to augment more qualitative research findings.

Citation:

You’re the Manager but I’m the Mayor: Understanding Foursquare Check-ins in Claimed Venues

Get Microsoft Silverlight

This talk is by Germaine Halegoua, one of our fantastic interns this summer and a brand-new Assistant Professor at the University of Kansas. She outlines her findings from her summer research project, about the location-based mobile service Foursquare.

The presentation includes preliminary findings and analysis from an ethnographic study of Foursquare users in the Boston area, focusing on their relationships with “friends” as well as “claimed venues” on Foursquare. This project aims to investigate how and why managers of Foursquare’s claimed venues and their patrons use location-based services; what relationships are forged between vendors and customers via Foursquare; how participants understand their own participation and the audiences for their actions; as well as attitudes about locational privacy and the meaning of location announcement over these networks. Some of these findings reflect information flows, practices of listening and responding, and relations of power that are relevant across other social network sites as well.

If you’re interested in LBS, this is a great introduction to some academic thinking on the topic.

Watch the full talk here.

What We’re Reading

We’re a diverse bunch here at the SMC, but what we have in common is that we are nerds who read a lot. I went on office patrol to find out what my compatriots have selected as their August summer reading.

danah boyd

Books danah boyd is reading
Parenting out of Control: Anxious Parents in Uncertain Times, by Margaret Nelson
“It’s a phenomenal examination of two different common parenting strategies and their class dimensions.”

The Cookbook Collector, by Allegra Goodman
“It’s so relevant even to today. Through fiction, [Goodman] shows the different players from different angles seeing the dot com boom and crash play out. I think in the tech industry, we tend to forget about these different angles.”

Danah is also reading The Wind-Up Girl by Paolo Bacigalupi, a critically-acclaimed steampunk/dystopia. She and I share a love for YA, sci-fi, and YA sci-fi.

Kate Crawford


Behavior in Public Places: Notes on the Social Organization of Gatherings, by Erving Goffman
“It’s a classic, but nobody’s read it. Eytan [Adar, visiting researcher] and I are using it to think about the London riots.”

The Immortal Life of Henrietta Lacks, by Rebecca Skloot
“Danah recommended this, because the theme of cell propagation is bizarrely relevant to [the paper they’re working on about] Big Data.”

Eytan Adar

Books Eytan is reading
The Taming of the American Crowd: From Stamp Riots to Shopping Sprees, by Al Sandine
“I bought this but haven’t read it yet. It’s for Kate and I’s project on the London riots.”

The Magician King, by Lev Grossman
“If you want junky books, I just finished the Magician King. I’m actually looking forward to the sequel. ”
I also just finished this, so I responded that a) it was written by the book critic for New York magazine and therefore is de facto not junky, b) it was a lot better than its predecessor, “The Magicians.”

Heather Castells


Paying for Sex: The Gentlemen’s Guide To Web Porn, Strip Clubs, Prostitutes & Escorts – Without Humiliation, Job Loss, Bankruptcy, Infection, Bloodshed Or Incarceration, by Kerr Fuffle and Roscoe Spanks
“Danah and I found this on Amazon. It’s a self-published guide for people who want to buy sex with escorts and prostitutes and how not to get caught. We wanted a better view of what the demand side looks like in human trafficking.”
Me: Is it written from a male perspective?
Heather: They say in the beginning of the book that it’s aimed solely at men.

Laura Norén

Books Laura Noren is reading
Look at Me, by Jennifer Egan
“I allow myself to read fiction in the summer. I loved the Goon Squad.”

Personal Connections in the Digital Age, by Nancy Baym
“This relates to research I’m doing. I’m excited for the chapter on communities and networks, because I think it’ll be relevant to my project on food blogging. I’m also excited for the chapter on authenticity, which apparently everyone has trouble pinning down.”

The Naked City: The Death and Life of Authentic Urban Places, by Sharon Zukin.
“I did not like this. It’s primarily about authenticity in the East Village and Red Hook, and I live in Red Hook.” Note: Laura also goes to NYU which is pretty much in the East Village. “I got my hopes up, meaning it was easy to disappoint me. It didn’t do quite what I thought it was going to do. On the other hand, it did help me understand how personal an experience of shared space is.”

Alice Marwick


Note that these are fake quotes from me, since I don’t talk out loud to myself, at least not usually

Ready Player One, by Ernest Cline
“I stayed up til 2 am last night finishing this. It’s about video games and 80s nostalgia in a dystopian future. I couldn’t put it down.”

Gender Circuits: Bodies and Identities in a Technological Age, by Eve Shapiro
“I’ve been working on a book chapter about gender and social media and really struggling. The snippets of this book I found on Google Books convinced me to buy it. I’m hoping it comes from Amazon tomorrow.”

What Technology Wants, by Kevin Kelly
“I haven’t started this either, but I’m interviewing Kevin Kelly next week and I’m pretty intimidated. I hope reading his latest will help me put together some non-idiotic questions.”

What are YOU reading?