Skip to content

I crowdsourced the design of my house

August 17, 2016

(or, “The Social-Media-ification of Everything”)

The architecture crowdsourcing Web site Arcbazar has been called “The Worst Thing to Happen To Architecture Since the Internet Started.” The site also got some press recently by running a parallel, unauthorized architecture competition for the “people’s choice” for the design of the Obama Presidential Library.

arcbazar screen shot home page

The arcbazar welcome page. (click to enlarge)

I’ve decided to use arcbazar.com to run two architectural competitions for my house. My competitions started yesterday (links below), in case you want to see this play out in real time.

Most of the attention given to arcbazar has been about labor, safety, and value. Discussion has centered around possible changes to the profession of architecture. Does it lower standards? Will it put architecture jobs and credentials in jeopardy?

Yet as a social media researcher the part of arcbazar that has my attention is what I would call the “social media-ification of everything.”

Anyone with a free arcbazar account can submit a design or act as a juror for submitted designs, and as the Web site has evolved it has added features that evoke popular social media platforms. Non-architects are asked to vote on designs, and the competitions use familiar social media features and metaphors like a competition “wall.”

Here are my competitions. You need a free account to look at them.

This means YOU could design my house, so please choose wisely. (One friend said: “You realize your house is going to be renamed Housey McHouseFace.”) Keep your fingers crossed for me that this works out well. Some of the submitted designs for past competitions are a little… odd…

obama building shaped like obamas name

Who wouldn’t want a house in the shape of their own name? (click to enlarge)

Three flawed assumptions the Daily Beast made about dating apps

August 16, 2016
Cpo7yz2VUAQs_5K

Image from @Cernovich

Last week, the Daily Beast published an article by one of its editors who sought to report about how dating apps were facilitating sexual encounters in Rio’s Olympic Village. Instead, his story focused mainly on athletes using Grindr, an app for men seeking men, and included enough personal information about individuals to identify and out them. After the article was criticized as dangerous and unethical across media outlets and social media, the Daily Beast replaced it with an apology. However, decisions to publish articles like this are made based on assumptions about who uses dating apps and how people share information on them. These assumptions are visible not only in how journalists act but also in the approaches that researchers and app companies take when it comes to users’ personal data. Ethical breeches like the one made by the Daily Beast will continue unless we address the following three (erroneous) assumptions:

Assumption 1. Data on dating apps is shareable like a tweet or a Facebook post

 Since dating apps are a hybrid between dating websites of the past and today’s social media, there is an assumption that the information users generate on dating apps should be shared. Zizi Papacharissi and Paige Gibson[1] have written about ‘shareability’ as the built-in way that social network sites encourage sharing and discourage withholding information. This is evident within platforms like Facebook and Twitter, through ‘share’ and ‘retweet’ buttons, as well as across the web as social media posts are formatted to be easily embedded in news articles and blog posts.

Dating apps provide many spaces for generating content, such as user profiles, and some app architectures are increasingly including features geared toward shareability. Tinder, for example, provides users with the option of creating a ‘web profile’ with a distinct URL that anyone can view without even logging into the app. While users determine whether or not to share their web profiles, Tinder also recently experimented with a “share” button allowing users to send a link to another person’s profile by text message or email. This creates a platform-supported means of sharing profiles to individuals who may never have encountered them otherwise.

The problem with dating apps adopting social media’s tendency toward sharing is that dating environments construct particular spaces for the exchange of intimate information. Dating websites have always required a login and password to access their services. Dating apps are no different in this sense – regardless of whether users login through Facebook authentication or create a new account, dating apps require users to be members. This creates a shared understanding of the boundaries of the app and the information shared within it.  Everyone is implicated in the same situation: on a dating app, potentially looking for sexual or romantic encounters. A similar boundary exists for me when I go to the gay bar; everyone I encounter is also in the same space so the information of my whereabouts is equally as implicating for them. However, a user hitting ‘share’ on someone’s Tinder profile and sending it to a colleague, family member, or acquaintance removes that information from the boundaries within which it was consensually provided. A journalist joining a dating app to siphon users’ information for a racy article flat out ignores these boundaries.

Assumption 2. Personal information on dating apps is readily available and therefore can be publicized

 When the Daily Beast’s editor logged into Grindr and saw a grid full of Olympic athletes’ profiles, he likely assumed that if this information was available with a few taps of his screen then it could also be publicized without a problem. Many arguments about data ethics get stuck debating whether information shared on social media and apps is public or private. In actuality, users place their information in a particular context with a specific audience in mind. The violation of privacy occurs when another party re-contextualizes this information by placing it in front of a different audience.

Although scholars have pointed out that re-contextualization of personal information is a violation of privacy, this remains a common occurrence even across academia. We were reminded of this last May when 70,000 OkCupid users’ data was released without permission by researchers in Denmark. Annette Markham’s post on the SMC blog pointed out that “the expectation of privacy about one’s profile information comes into play when certain information is registered and becomes meaningful for others.” This builds on Helen Nissenbaum’s[2] notion of “privacy in context” meaning that people assume the information they share online will be seen by others in a specific context. Despite the growing body of research confirming that this is exactly how users view and manage their personal information, I have come across many instances where researchers have re-published screenshots of user profiles from dating apps without permission. These screenshots are featured in presentations, blog posts, and theses with identifying details that violate individuals’ privacy by re-contextualizing their personal information for an audience outside the app. As an academic community, we need to identify this as an unethical practice that is potentially damaging to research subjects.

Dating app companies also perpetuate the assumption that user information can be shared across contexts through their design choices. Recently, Tinder launched a new feature in the US called Tinder Social, which allows users to join with friends and swipe on others to arrange group hangouts. Since users team up with their Facebook friends, activating this feature lets you see everyone else on your Facebook account who is also on Tinder with this feature turned on. While Tinder Social requires users to ‘unlock’ its functionality from their Settings screen, its test version in Australia automatically opted users in. When Australian users updated their app, this collapsed a boundary between the two platforms that previously kept the range of family, friends, and acquaintances accumulated on Facebook far, far away from users’ dating lives. While Tinder seems to have learned from the public outcry about this privacy violation, the company’s choice to overlap Facebook and Tinder audiences disregards how important solid boundaries between social contexts can be for certain users.

 Assumption 3. Sexuality is no big deal these days

 At the crux of the Daily Beast article was the assumption that it was okay to share potentially identifying details about people’s sexuality. As others have pointed out, just because same-sex marriage and other rights have been won by lesbian, bisexual, gay, trans, and queer (LGBTQ) people in some countries, many cultures, religions, and political and social groups remain extremely homophobic. Re-contextualization of intimate and sexual details shared within the boundaries of a dating app not only constitutes a violation of privacy, it could expose people to discrimination, abuse, and violence.

In my research with LGBTQ young people, I’ve learned that a lot of them are very skilled at placing information about their sexuality where they want it to be seen and keeping it absent from spaces where it may cause them harm. For my master’s thesis, I interviewed university students about their choices of whether or not to come out on Facebook. Many of them were out to a certain degree, posting about pro-LGBTQ political views and displaying their relationships in ways that resonated with friendly audiences but eluded potentially homophobic audiences like coworkers or older adults.

In my PhD, I’ve focused on how same-sex attracted women manage their self-representations across social media. Their practices are not clear-cut since different social media spaces mean different things to users. One interviewee talked about posting selfies with her partner to Facebook for friends and family but not to Instagram where she’s trying to build a network of work and church-related acquaintances. Another woman spoke about cross-posting Vines to friendly LGBTQ audiences on Tumblr but keeping them off of Instagram and Facebook where her acquaintances were likely to pick fights over political issues. Many women talked about frequently receiving negative, discriminatory, and even threatening homophobic messages despite these strategies, highlighting just how important it was for them to be able to curate their self-representations. This once again defies the tendency to designate some sites or pieces of information as ‘public’ and others as ‘private.’ We need to follow users’ lead by respecting the context in which they’ve placed personal information based on their informed judgments about audiences.

Journalists, researchers, and app companies frequently make decisions based on assumptions about dating apps. They assume that since the apps structurally resemble other social media then it’s permissible to carry out similar practices tending toward sharing user-generated information. This goes hand-in-hand with the assumption that if user data is readily available, it can be re-contextualized for other purposes. On dating apps, this assumes (at best) that user data about sexuality will be received neutrally across contexts and at its worst, this data is used without regard for the harm it may cause. There is ample evidence that none of these assumptions hold true when we look at how people create bounded spaces for exchanging intimate information, how users manage their personal information in particular contexts, and how LGBTQ people deal with enduring homophobia and discrimination. While the Daily Beast should not have re-contextualized dating app users’ identifying information in its article, this instance provides an opportunity to dispel these assumptions and change how we design, research, and report about dating apps in order to treat users’ information more ethically.

 

 [1] Papacharissi, Z., & Gibson, P. L. (2011). Fifteen minutes of privacy: Privacy, sociality and publicity on social network sites. In S. Trepte & L. Reinecke (Eds.), Privacy Online (pp. 75–89). Berlin: Springer.

[2] Nissenbaum, H. (2009). Privacy in context: Technology, policy, and the integrity of social life. Stanford, CA: Standford University Press.

How machine learning can amplify or remove gender stereotypes

August 6, 2016

TLDR: It’s easier to remove gender biases from machine learning algorithms than from people.

In a recent paper, Saligrama, Bolukbasi, Chang, Zou, and I stumbled across some good and bad news about Word Embeddings. Word Embeddings are a wildly popular tool of the trade among AI researchers. They can be used to solve analogy puzzles. For instance, for man:king :: woman:x, AI researchers celebrate when the computer outputs xqueen (normal people are surprised that such a seemingly trivial puzzle could challenge a computer). Inspired by our social scientist colleagues (esp. Nancy Baym, Tarleton Gillespie and Mary Gray), we dug a little deeper and wrote a short program that found the “best” he:x :: she:y analogies, where best is determined according to the embedding of common words and phrases in the most popular publicly available Word Embedding (trained using word2vec on 100 billion words from Google News articles).

The program output a mixture of x-y pairs ranging from definitional, like brother-sister (i.e. he is to brother as she is to sister), to stereotypical, like blue-pink or guitarist-vocalist, to blatantly sexist, like surgeon-nurse, computer programmer-homemaker, and brilliant-lovely. There were also some humorous ones like he is to kidney stone as she is to pregnancysausages-buns, and WTF-OMG. For more analogies and an explanation of the geometry behind them, read more below or see our paper, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.

Bad news: the straightforward application of Word Embeddings can inadvertently *amplify* biases. These Word Embeddings are being used in increasingly many applications. Among the countless papers that discuss Word Embeddings for use in searching the web, processing resumes, chatbots, etc., etc., hundreds of articles mention the king-queen analogy while none of them notice the blatant sexism present.

Say someone searches for computer programmer. A nice paper has shown how to improve search results using the knowledge in Word Embeddings that the term computer programmer is related to terms like javascript. Using this, search results containing these related terms can bubble up and it was shown that the average results of such a system are statistically more relevant to the query.  However, it also happens that the name John has a stronger association with programmer than the name Mary. This means that, between two identical results that differed only in the names John/Mary, John’s would be ranked first. This would *amplify* the statistical bias that most programmers are male by moving the few female programmers even lower in the search results.

Now you might think that we could solve this problem by simply removing names from embeddings – but there are still subtle indirect biases: the term computer programmer is also closer to baseball than to gymnastics, and as you can imagine, removing names wouldn’t entirely solve the problem.

Good news: biases can easily be reduced/removed from word embeddings. With a touch of a button, we can remove all gender associations between professions, names, and sports in a word embedding. In fact, the word embedding itself captures these concepts so you only have to give a few examples of the kinds of associations you want to keep and the kind you want to remove, and the machine learning algorithms do the rest. Think about how much easier this is for a computer than a human. Men and women have all been shown to have implicit gender associations. And the Word Embeddings also surface shocking gender associations implicit in the text on which they were trained.

People can try to ignore these associations when doing things like evaluating candidates for hiring, but it is a constant uphill battle. A computer, on the other hand, can be programmed to remove associations between different sets of words once, and with ease it will continue along with its work. Of course, we machine learning researchers still need to be careful — depending on the application, biases can creep in other ways. Also, I mention that we are providing tools that others can use to define, remove, negate, but also possibly even amplify biases as they choose for their applications.

As machine learning and AI become ever more ubiquitous, there have been growing pubic discussions about the social benefits and possible dangers of AI. Our research gives insight into a concrete example where a popular, unsupervised machine learning algorithm, when trained over a large corpus of text, reflects and crystallizes the stereotypes in the data and in our society. Wide-spread adoptions of such algorithms can greatly amplify such stereotypes with damaging consequences. Our work highlights the importance to quantify and understand such biases in machine learning and also how machine learning algorithms may be used to reduce bias.

Future work: This work focused on gender biases, specifically male-female biases, but we are now working on techniques for identifying and removing all sorts of biases such as racial biases from Word Embeddings.

Why I Am Suing the Government — Update

August 3, 2016

Last month I joined other social media researchers and the ACLU to file a lawsuit against the US Government to protect the legal right to conduct online research. This is newly relevant today because a community of devs interested in public policy started a petition in support of our court case. It is very nice of them to make this petition. Please consider signing it and sharing this link.

PETITION: Curiosity is (not) a crime
http://slashpolicy.com/petition/curiosity-is-not-a-crime/


For more context, see last month’s post: Why I Am Suing the Government.

 

Why I Am Suing the Government

July 1, 2016

(or: I write scripts, bots, and scrapers that collect online data)

I never thought that I would sue the government. The papers went in on Wednesday, but the whole situation still seems unreal. I’m a professor at the University of Michigan and a social scientist who studies the Internet, and I ran afoul of what some have called the most hated law on the Internet.

Others call it the law that killed Aaron Swartz. It’s more formally known as the Computer Fraud and Abuse Act (CFAA), the dangerously vague federal anti-hacking law. The CFAA is so broad, you might have broken it. The CFAA has been used to indict a MySpace user for adding false information to her profile, to convict a non-programmer of “hacking,” to convict an IT administrator of deleting files he was authorized to access, and to send a dozen FBI agents to the house of a computer security researcher with their guns drawn.

Most famously, prosecutors used the CFAA to threaten Reddit co-founder and Internet activist Aaron Swartz with 50 years in jail for an act of civil disobedience — his bulk download of copyrighted scholarly articles. Facing trial, Swartz hung himself at age 26.

The CFAA is alarming. Like many researchers in computing and social science, writing scripts, bots, or scrapers that collect online data is a normal part of my work. I routinely teach my students how to do it in my classes. Now that all sorts of activities have moved online — from maps to news to grocery shopping — studying people means now means studying people online and thus gathering online data. It’s essential. 

Les raboteurs de parquet (cropped)

Image: Les raboteurs de parquet by Gustave Caillebotte (cropped)
SOURCE: Wikipedia

Yet federal charges were brought against someone who was downloading publicly available Web pages.

People might think of the CFAA as a law about hacking with side effects that are a problem for computer security researchers. But the law affects anyone who does social research, or who needs access to public information. 

I work at a public institution. My research is funded by taxes and is meant for the greater good. My results are released publicly. Lately, my research designs have been investigating illegal fraud and discrimination online, evils that I am trying to stop. But the CFAA made my research designs too risky. A chief problem is that any clause in a Web site’s terms of service can become enforceable under the CFAA.

I found that crazy. Have you ever read a terms of service agreement? Verizon’s terms of service prohibited anyone using a Verizon service from saying bad things about Verizon. As it says in the legal complaint, some terms of service prohibit you from writing things down (as in, with a pen) if you saw them on a particular — completely public — Web page.

These terms of service aren’t laws, they’re statements written by Web site owners describing what they’d like to happen if they ran the universe. But the current interpretation of the CFAA says that we must judge what is authorized on the Web by reading a site’s terms of service to see what has been prohibited. If you violate the terms of service, the current CFAA mindset is: you’re hacking.

That means anything a Web site owner writes in the terms of service effectively becomes the law, and these terms can change at any time.

Did you know that terms of service can expressly prohibit the use of a Web site by researchers? Sites effectively prohibit research by simply outlawing any saving or republication of their contents, even if they are public Web pages. Dice.com forbids “research or information gathering,” while LinkedIn says you can’t “copy profiles and information of others through any means” including “manual” means. You also can’t “[c]ollect, use, copy, or transfer any information obtained from LinkedIn,” or “use the information, content or data of others.” (This begs the question: How would the intended audience possibly use LindedIn and follow these rules? Memorization?)

As a researcher, I was appalled by the implications, once they sunk in. The complaint I filed this week has to do with my research on anti-discrimination laws, but it is not too broad to say this: The CFAA, as things stand, potentially blocks all online research. Any researcher who uses information from Web sites could be at risk from the provision in our lawsuit. That’s why others have called this case “key to the future of social science.”

If you are a researcher and you think other researchers would be interested in this information, please share this information. We need to get the word out that the present situation is untenable.

NEW: There is now an online petition started by a cool group of policy-minded devs on our behalf. Please consider signing and sharing it.

The ACLU is providing my legal representation, and in spirit I feel that they have taken this case on behalf of all researchers and journalists. If you care about this issue and you’d like to help, I urge you to contribute.

 

Want more? Here is an Op-Ed that I co-authored with my co-plaintiff Prof. Karrie Karahalios:

Most of what you do online is illegal. Let’s end the absurdity.
https://www.theguardian.com/commentisfree/2016/jun/30/cfaa-online-law-illegal-discrimination

Here is the legal complaint:

Sandvig v. Lynch
https://www.aclu.org/legal-document/sandvig-v-lynch-complaint

Here is a press release about the lawsuit:

ACLU Challenges Law Preventing Studies on “Big Data” Discrimination
https://www.aclu.org/news/aclu-challenges-law-preventing-studies-big-data-discrimination

Here is some of the news coverage:

Researchers Sue the Government Over Computer Hacking Law
https://www.wired.com/2016/06/researchers-sue-government-computer-hacking-law/

New ACLU lawsuit takes on the internet’s most hated hacking law
http://www.theverge.com/2016/6/29/12058346/aclu-cfaa-lawsuit-algorithm-research-first-amendment

Do Housing and Jobs Sites Have Racist Algorithms? Academics Sue to Find Out
http://arstechnica.com/tech-policy/2016/06/do-housing-jobs-sites-have-racist-algorithms-academics-sue-to-find-out/

When Should Hacking Be Legal?
http://www.theatlantic.com/technology/archive/2016/07/when-should-hacking-be-legal/489785/

Please note that I have filed suit as a private citizen and not as an employee of the University.

[Updated on 7/2 with additional links.]

[Updated on 8/3 with the online petition.]

 

Awakenings of the Filtered

June 20, 2016

I was delighted to give the Robert M. Pockrass Memorial Lecture at Penn State University this year, titled “Awakenings of the Filtered: Algorithmic Personalization in Social Media and Beyond.” I used the opportunity to give a broad overview of recent work about social media filtering algorithms and personalization. Here it is:

I tried to argue that media of all kinds have been transformed to include automatic selection and ranking as a basic part of their operation, that this transformation is significant, and that it carries significant dangers that are currently not well-understood.

Some highlights: I worry that algorithmic filtering as it is currently implemented suppresses the dissemination of important news, distorts our interactions with friends and family, disproportionately deprives some people of opportunity, and that Internet platforms intentionally obscure the motives and processes by which algorithms effect these consequences.

I say that users and platforms co-produce relevance in social media. I note that the ascendant way to reason about communication and information is actuarial, which I call “actuarial media.”  I discuss “corrupt personalization,” previously a topic on this blog. I propose that we are seeing a new kind of “algorithmic determinism” where cause and effect are abandoned in reasoning about the automated curation of content.

I also mention the anti-News Feed (or anti-filtering) backlash, discuss whether or not Penn State dorms have bathrooms, and talk about how computers recognize cat faces.

Penn State was a great audience, and the excellent question and answer session is not captured here.  Thanks so much to PSU for having me, and for allowing me to post this recording. A particularly big thank you to Prof. Matthew McAllister and the Pockrass committee, and to Jenna Grzeslo for the very kind introduction.

I welcome your thoughts!

 

The OKCupid data release fiasco: It’s time to rethink ethics education

May 18, 2016

In mid 2016, we confront another ethical crisis related to personal data, social media, the public internet, and social research. This time, it’s a release of some 70,0000 OKCupid users’ data, including some very intimate details about individuals. Responses from several communities of practice highlight the complications of using outdated modes of thinking about ethics and human subjects when considering new opportunities for research through publicly accessible or otherwise easily obtained data sets (e.g., Michael Zimmer produced a thoughtful response in Wired and Kate Crawford pointed us to her recent work with Jacob Metcalf on this topic). There are so many things to talk about in this case, but here, I’d like to weigh in on conversations about how we might respond to this issue as university educators.

The OKCupid case is just the most recent of a long list of moments that reveal how doing something because it is legal is no guarantee that it is ethical. To invoke Kate Crawford’s apt Tweet from March 3, 2016:

This is a key point of confusion, apparently. Michael Zimmer, reviewing multiple cases of ethical problems emerging when large datasets are released by researchers emphasizes the flaw in this response, noting:

This logic of “but the data is already public” is an all-too-familiar refrain used to gloss over thorny ethical concerns (in Wired).

In the most recent case, the researcher in question, Emil Kirkegaard, uses this defense in response to questions asking if he anonymized the data: “No. Data is already public.” I’d like to therefore add a line to Crawford’s simple advice:

Data comes from people. Displaying it for the world to see can cause harm.

A few days after this data was released, it was removed from the Open Science Framework, after a DMCA claim by OKCupid. Further legal action could follow. All of this is a good step toward protecting the personal data of users, but in the meantime, many already downloaded and are now sharing the dataset in other forms. As Scott Weingart, digital humanities specialist at Carnegie Mellon, warns:

As a long term university educator, a faculty member at the same university where Kirkegaard is pursuing his Masters degree, and a researcher of digital ethics, this OKCupid affair frustrates me: How is it possible that we continue to reproduce this logic, despite the multiple times “it’s publicly accessible therefore I can do whatever I want with it” has proved harmful? We must attribute some responsibility to existing education systems. Of course, the problem doesn’t start there and “education system” can be a formal institution or simply the way we learn as everyday knowledge is passed around in various forms. So there are plenty of arenas where we learn (or fail to learn) to make good choices in situations fraught with ethical complexity. Let me offer a few trajectories of thought:

What data means to regulators

The myth of “data is already public, therefore ethically fine to use for whatever” persists because traditional as well as contemporary legal and regulatory statements still make a strong distinction between public and private. This is no longer a viable distinction, if it ever was. When we define actions or information as being either in the private or the public realm, this sets up a false binary that is not true in practice or perception. Information is not a stable object that emerges in and remains located in a particular realm or sphere. Data becomes informative or is noticed only when it becomes salient for some reason. On OKCupid or elsewhere, people publish their picture, religious affiliation, or sexual preference in a dating profile as part of a performance of their identity for someone else to see. This placement of information is intended to be part of an expected pattern of interaction — someone is supposed to see and respond to this information, which might then spark conversation or a relationship. This information is not chopped up into discrete units in either a public or private realm. Rather, it is performative and relational. When we only access regulatory language, the more nuanced subtleties of context are rendered invisible.

What data means to people who produce it

Whether information or data is experienced or felt as something public or private is quite different from the information itself. Violation of privacy can be an outcome at any point. This is not related to the data, but the ways in which the data is used. From this standpoint, data can only logically exist as part of continual flows of timespace contexts; therefore, to extract data as points from one or the other static sphere is illogical. Put more simply, the expectation of privacy about one’s profile information comes into play when certain information is registered and becomes meaningful for others. Otherwise, the information would never enter into a context where ‘public’, ‘private’, ‘intimate’, ‘secret’, or any other adjective operates as a relevant descriptor.

This may not be the easiest idea for us to understand, since we generally conceptualize data as static and discrete informational units that can be observed, collected, and analyzed. In experience, this is simply not true. The treatment of personal data is important. It requires sensitivity to the context as well as an understanding of the tools that can be used to grapple with this complexity.

What good researchers know about data and ethics

Reflexive researchers know that regulations may be necessary, but they are insufficient guides for ethics. While many lessons from previous ethical breaches in scientific research find their way into regulatory guidelines or law, unique ethical dilemmas arise as a natural part of any research of any phenomenon. According to the ancient Greeks, doing the right thing is a matter of phronesis or practical wisdom whereby one can discern what would constitute the most ethical choice in any situation, an ability that grows stronger with time, experience, and reflection.

This involves much more than simply following the rules or obeying the letter of the law. Phronesis is a very difficult thing to teach, since it is a skill that emerges from a deep understand of the possible intimacy others have with what we outsiders might label ‘data.’ This reflection requires that we ask different questions than what regulatory prescriptions might require. In addition to asking the default questions such as “Is the data public or private?” or “does this research involve a ‘human subject’?” we should be asking “What is the relationship between a person and her data?” Or “How does the person feel about his relationship with his data?” These latter questions don’t generally appear in regulatory discussions about data or ethics. These questions represent contemporary issues that have emerged as a result of digitization plus the internet, an equation that illustrates information can be duplicated without limits and is swiftly and easily separated from its human origins once it disseminates or moves through the network. In a broader sense, this line of inquiry highlights the extent to which ‘data’ can be mischaracterized.

Where do we learn the ethic of accountability?

While many scholars concerned with data ethics discuss complex questions, the complexity doesn’t often end up traditional classrooms or regulatory documents. We learn to ask the tough questions when complicated situations emerge, or when a problem or ethical dilemma arises. At this point, we may question and adjust our mindset. This is a process of continual reflexive interrogation of the choices we’re making as researchers. And we get better at it over time and practice.

We might be disappointed but we shouldn’t be surprised that many people end up relying on outdated logic that says ‘if data is publicly accessible, it is fair game for whatever we want to do with it’. This thinking is so much easier and quicker than the alternative, which involves not only judgment, responsibility, and accountability, but also speculation about the potential future impact of one’s research.

Learning contemporary ethics in a digitally-saturated and globally networked epoch involves considering the potential impact of one’s decisions and then making the best choice possible. Regulators are well aware of this, which is why they (mostly) include exceptions and specific case guidance in statements about how researchers should treat data and conduct research involving human subjects.

Teaching ethics as ‘levels of impact’

So, how might we change the ways we talk and teach about ethics to better prepare researchers to take the extra step of reflecting on how their research choices matter in the bigger picture? First, we can make this an easier topic to broach by addressing ethics as being about choices we make at critical junctures; choices that will invariably have impact.

We make choices, consciously or unconsciously, throughout the research process. Simply stated, these choices matter. If we do not grapple with natural and necessary change in research practices our research will not reflect the complexities we strive to understand. — Annette Markham, 2003.

Ethics can be thus considered a matter of methods. “Doing the right thing” is an everyday activity, as we make multiple choices about how we might act. Our decisions and actions transform into habits, norms, and rules over time and repetition. Our choices carry consequences. As researchers, we carry more responsibility than users of social media platforms. Why? Because we hold more cards when we present findings of studies and make knowledge statements intended to present some truth -big or little T- about the world to others.

To dismiss our everyday choices as being only guided by extant guidelines is a naïve approach to how ethics are actually produced. Beyond our reactions to this specific situation, as Michael Zimmer emphasizes in his recent Wired article, we must address the conceptual muddles present in big data research.

This is quite a challenge when the terms are as muddled as the concepts. Take the word ‘ethics.’ Although it’s an important term that operates as an important foundation in our work as researchers, it is also abstract, vague, and daunting because it can feel like you ought to have philosophy training to talk about it. As educators, we can lower the barrier to entry into ethical concepts by taking a ‘what if’ impact approach, or discussing how we might assess the ‘creepy’ factor in our research design, data use, or technology development.

At the most basic level of an impact approach, we might ask how our methods of data collection impact humans, directly. If one is interviewing, or the data is visibly connected to a person, this is easy to see. But a distance principle might help us recognize that when the data is very distant from where it originated, it can seem disconnected from persons, or what some regulators call ‘human subjects.’ At another level, we can ask how our methods of organizing data, analytical interpretations, or findings as shared datasets are being used — or might be used — to build definitional categories or to profile particular groups in ways that could impact livelihoods or lives. Are we contributing positive or negative categorizations? At a third level of impact, we can consider the social, economic, or political changes caused by one’s research processes or products, in both the short and long term. These three levels raise different questions than those typically raised by ethics guidelines and regulations. This is because an impact approach is targeted toward the possible or probable impact, rather than the prevention of impact in the first place. It acknowledges that we change the world as we conduct even the smallest of scientific studies, and therefore, we must take some personal responsibility for our methods.

Teaching questions rather than answers

Over the six years I spent writing guidelines for the updated ‘Ethics and decision making in internet research” document for the International Association of Internet Researchers (AoIR), I realized we had shifted significantly from statements to questions in the document. This shift was driven in part by the fact that we came from many different traditions and countries and we couldn’t come to consensus about what researchers should do. Yet we quickly found that posing these questions provided the only stable anchor point as technologies, platforms, and uses of digital media were continually changing. As situations and contexts shifted, different ethical problems would arise. This seemingly endless variation required us to reconsider how we think about ethics and how we might guide researchers seeking advice. While some general ethical principles could be considered in advance, best practices emerged through rigorous self-questioning throughout the course of a study, from the outset to well after the research was completed. Questions were a form that also allowed us to emphasize the importance of active and conscious decision-making, rather than more passive adherence to legal, regulatory, or disciplinary norms.

A question-based approach emphasizes that ethical research is a continual and iterative process of both direct and tacit decision making that must be brought to the surface and consciously accounted for throughout a project. This process of questioning is most obvious when the situation or direction is unclear and decisions must be made directly. But when the questions as well as answers are embedded in and produced as part of our habits, these must be recognized for what they once were — choices at critical junctures. Then, rather than simply adopting tools as predefined options, or taking analytical paths dictated by norm or convention, we can choose anew.

This recent case of the OKCupid data release provides an opportunity for educators to revisit our pedagogical approaches and to confront this confusion head on. It’s a call to think about options that reach into the heart of the matter, which means adding something to our discussions with junior researchers to counteract the depersonalizing effects of generalized top down requirements, forms with checklists, and standardized (and therefore seemingly irrelevant) online training modules.

  • This involves questioning as well as presenting extant ethical guidelines, so that students understand more about the controversies and ongoing debates behind the scenes as laws and regulations are developed.
  • It demands that we stop treating IRB or ethics boards requirements as bureaucratic hoops to jump through, so that students can appreciate that in most studies, ethics require revisiting.
  • It means examining the assumptions underlying ethical conventions and reviewing debates about concepts like informed consent, anonymizing data, or human subjects, so that students better appreciate these as negotiable and context-dependent, rather than settled and universal concepts.
  • It involves linking ethics to everyday logistic choices made throughout a study, including how questions are framed, how studies are designed, and how data is managed and organized. In this way students can build a practice of reflection on and engagement around their research decisions as meaningful choices rather than externally prescribed procedures.
  • It asks that we understand ethics as they are embedded in broader methodological processes — perhaps by discussing how analytical categories can construct cultural definitions, how findings can impact livelihoods, or how writing choices and styles can invoke particular versions of stories. In this way, students can understand that their decisions carry over into other spheres and can have unintended or unanticipated results.
  • It requires adding positive examples to the typically negative cases, which tend to describe what we should not do, or how we can get in trouble. In this way, students can consider the (good and important) ethics of conducting research that is designed to make actual and positive transformations in the broader world.

This list is intended to spark imagination and conversation more than to explain what’s currently happening (for that, I would point to Metcalf’s 2015 review of various pedagogical approaches to ethics in the U.S.). There are obviously many ways to address or respond to this most recent case, or any of the dozens of cases that pose ethical problems.

I, for one, will continue talking more in my classrooms about how, as researchers, our work can be perceived as creepy, stalking, or harassing; exploring how our research could cause harm in the short or long term; and considering what sort of futures we are facilitating as a result of our contributions in the here and now.

For more about data and ethics, I recommend the annual Digital Ethics Symposium at Loyola University-Chicago; the growing body of work emerging from the Council for Big Data, Ethics, & Society; and the internationalAssociation of Internet Studies (AoIR) ethics documents and the work of their longstanding ethics committee members. For current discussions around how we conceptualize data in social research, one might take a look at special issues devoted to the topic, like the 2013 issue on Making Data: Big data and beyond in First Monday, or the 2014 issue on Critiquing Big Data in the International Journal of Communication.These are just the first works off the top of my head that have inspired my own thinking and research on these topics.

Follow

Get every new post delivered to your Inbox.

Join 1,675 other followers