(or: I write scripts, bots, and scrapers that collect online data)
[This is an old post. SEE ALSO: The most recent blog post about this case.]
I never thought that I would sue the government. The papers went in on Wednesday, but the whole situation still seems unreal. I’m a professor at the University of Michigan and a social scientist who studies the Internet, and I ran afoul of what some have called the most hated law on the Internet.
Others call it the law that killed Aaron Swartz. It’s more formally known as the Computer Fraud and Abuse Act (CFAA), the dangerously vague federal anti-hacking law. The CFAA is so broad, you might have broken it. The CFAA has been used to indict a MySpace user for adding false information to her profile, to convict a non-programmer of “hacking,” to convict an IT administrator of deleting files he was authorized to access, and to send a dozen FBI agents to the house of a computer security researcher with their guns drawn.
Most famously, prosecutors used the CFAA to threaten Reddit co-founder and Internet activist Aaron Swartz with 50 years in jail for an act of civil disobedience — his bulk download of copyrighted scholarly articles. Facing trial, Swartz hung himself at age 26.
The CFAA is alarming. Like many researchers in computing and social science, writing scripts, bots, or scrapers that collect online data is a normal part of my work. I routinely teach my students how to do it in my classes. Now that all sorts of activities have moved online — from maps to news to grocery shopping — studying people means now means studying people online and thus gathering online data. It’s essential.
Image: Les raboteurs de parquet by Gustave Caillebotte (cropped)
Yet federal charges were brought against someone who was downloading publicly available Web pages.
People might think of the CFAA as a law about hacking with side effects that are a problem for computer security researchers. But the law affects anyone who does social research, or who needs access to public information.
I work at a public institution. My research is funded by taxes and is meant for the greater good. My results are released publicly. Lately, my research designs have been investigating illegal fraud and discrimination online, evils that I am trying to stop. But the CFAA made my research designs too risky. A chief problem is that any clause in a Web site’s terms of service can become enforceable under the CFAA.
I found that crazy. Have you ever read a terms of service agreement? Verizon’s terms of service prohibited anyone using a Verizon service from saying bad things about Verizon. As it says in the legal complaint, some terms of service prohibit you from writing things down (as in, with a pen) if you saw them on a particular — completely public — Web page.
These terms of service aren’t laws, they’re statements written by Web site owners describing what they’d like to happen if they ran the universe. But the current interpretation of the CFAA says that we must judge what is authorized on the Web by reading a site’s terms of service to see what has been prohibited. If you violate the terms of service, the current CFAA mindset is: you’re hacking.
That means anything a Web site owner writes in the terms of service effectively becomes the law, and these terms can change at any time.
Did you know that terms of service can expressly prohibit the use of a Web site by researchers? Sites effectively prohibit research by simply outlawing any saving or republication of their contents, even if they are public Web pages. Dice.com forbids “research or information gathering,” while LinkedIn says you can’t “copy profiles and information of others through any means” including “manual” means. You also can’t “[c]ollect, use, copy, or transfer any information obtained from LinkedIn,” or “use the information, content or data of others.” (This begs the question: How would the intended audience possibly use LindedIn and follow these rules? Memorization?)
As a researcher, I was appalled by the implications, once they sunk in. The complaint I filed this week has to do with my research on anti-discrimination laws, but it is not too broad to say this: The CFAA, as things stand, potentially blocks all online research. Any researcher who uses information from Web sites could be at risk from the provision in our lawsuit. That’s why others have called this case “key to the future of social science.”
— Phil Howard (@pnhoward) June 30, 2016
If you are a researcher and you think other researchers would be interested in this information, please share this information. We need to get the word out that the present situation is untenable.
The ACLU is providing my legal representation, and in spirit I feel that they have taken this case on behalf of all researchers and journalists. If you care about this issue and you’d like to help, I urge you to contribute.
Want more? Here is an Op-Ed that I co-authored with my co-plaintiff Prof. Karrie Karahalios:
Most of what you do online is illegal. Let’s end the absurdity.
Here is the legal complaint:
Sandvig v. Lynch
Here is a press release about the lawsuit:
ACLU Challenges Law Preventing Studies on “Big Data” Discrimination
Here is some of the news coverage:
Researchers Sue the Government Over Computer Hacking Law
New ACLU lawsuit takes on the internet’s most hated hacking law
Do Housing and Jobs Sites Have Racist Algorithms? Academics Sue to Find Out
When Should Hacking Be Legal?
Please note that I have filed suit as a private citizen and not as an employee of the University.
[Updated on 7/2 with additional links.]
[Updated on 8/3 with the online petition.]