Mary Gray, Mike Ananny and I are writing a paper on queer youth and “Glee” for the American Anthropological Association’s annual meeting (yes, I have the greatest job in the world). This is a multi-methodological study by design, because traditional television viewing practices have become so complex. Besides traditional audience ethnography like interviews and participant observation, we are using textual analysis to analyze episode themes, and collected a large corpus of tweets with Glee-related hashtags. This summer, I worked with my high school intern, Jazmin Gonzales-Rivero, to go through this corpus of tweets and pull out useful information for the paper.
We’ve written and published a basic report on using off-the-shelf tools to see patterns and themes in large Twitter data set quickly and easily.
With the increasing popularity of large social software applications like Facebook and Twitter, social scientists and computer scientists have begun developing innovative approaches to dealing with the vast amounts of data produced and collected in such environments. For qualitative researchers, the methods involved can be daunting and unfamiliar. In this report, we outline some basic procedures for working with a large-scale Twitter data set to answer qualitative inquiries. We use Python, MySQL, and the word-cloud generator Wordle to identify patterns in re-tweets, tweet authors, dates and times of tweets, frequency of hashtags, and frequency of word use. Such data can provide valuable augmentation to qualitative inquiry. This paper is aimed at social scientists and humanities scholars with limited experience with big data and a lack of computing resources to do extensive quantitative research.
Marwick, A. and Gonzales-Rivero, J. (2011). Learning to Work with Large-Scale Twitter Data Sets: Using Off-The-Shelf Tools to Quickly and Easily See Tweet Patterns. Microsoft Research Social Media Collective Report, MSR-SMC-11-01, Cambridge, MA. [Download as PDF]
If you’re a seasoned computer scientist or a Big Data aficionado, the information in this paper will seem quite simplistic. But for those of us without programming backgrounds who study Twitter or other forms of social media, the idea of tackling a set of 450,000 tweets can seem quite daunting. In this paper, Jazmin and I walk step-by-step through the methods she used to parse a set of Tweets, using free and easily accessible tools like MySQL, Python, and Wordle. We hope this will be helpful for other legal, humanities, and social science scholars who might want to dip their foot into Big Data to augment more qualitative research findings.