What she said

Jul 21 2011 Published by under information policy

I don't know how many people outside of libraries are following the Aaron Swartz case. He's a "founder" of Readit who has been indicted for several things related to scraping the journal article content from JSTOR.  Read Nancy Sims' take. Note as she does: 1) he didn't do it from his own institution's network, 2) he spoofed his MAC address and did other things to dodge the technical protective measures including hiding in a network closet, and 3) we don't know his intention with respect to what he downloaded. Also note that JSTOR has a program for textmining and other research uses of their data so there's no excuse in that regard.

JSTOR supposedly got the content back from him.  I find that hard to believe. Now, according to RWW, someone else has shared a much smaller portion of JSTOR content to PirateBay. Yes, the materials themselves are out of copyright; however, they were most certainly not obtained legally if downloaded from JSTOR.

JSTOR is a great service, but if they don't somehow make it clear to publishers that they are doing everything they can to prevent this sort of thing, then their business might suffer. We'll see.

update:Based on this ars piece,  I guess we do know Swartz' intentions.

It's safe to say that Swartz would approve of Maxwell's actions. In a 2008 "Guerrilla Open Access Manifesto" first reported by the New York Times, Swartz wrote that "we need to download scientific journals and upload them to file-sharing networks."

update 2: Kristen Eschenfelder has a great post on the topic. She points out that this isn't effective as civil disobedience because it was done in secret, and anyway it's the wrong target if that's the goal. Go read it.

7 responses so far

  • Dave says:

    I don't condone what he did (he did cause quite a bit of havoc), but the question is whether or not what he did is *illegal*. From my understanding of the JSTOR license with MIT, it would have been perfectly legal for him to download every article in the db, one at a time. He merely automated it. Now, the way he went about it suggests that he knew he shouldn't have been doing it, but that still doesn't make it illegal. I think it's also important to note that neither MIT or JSTOR were interested in pursuing action against him.

    The indictment contains allegations--they may or may not be proven true at trial, such as his intent in the download. While I'm certainly not comfortable with what Swartz did, I think I'm even less comfortable with the government prosecuting him *criminally*.

    • Christina Pikas says:

      No, if he did it one at a time it wouldn't be ok. It's written into the license that you can't download all of the articles from a single journal. Also your use has to be "personal use."

      The criminal prosecution isn't about copyright, it's about hacking the network - if you actually read it.

      JSTOR has to walk a fine line - they're a non-profit. They can't be seen as being nasty, but at the same time they have to make sure the publishers will continue to lease them content. That probably explains their response.

  • Andy says:

    Thanks for contributing to this discussion!

    I'm surprised at your position on this matter. You may not be aware of Aaron's previous publications enabled by similar scraping of supposedly-available-to-the-public-but-ratelimited documents; in particular PACER records as treated in this NY Times coverage: http://www.nytimes.com/2009/02/13/us/13records.html and hundreds of thousands of law review articles resulting in a major journal article http://www.stanfordlawreview.org/content/article/punitive-damages-remunerated-research-and-legal-profession . There is a new kind of scholarship enabled by modern data processing and text mining, and it would be a shame to stifle or discount that scholarship for trivial technical reasons.

    You also mentioned JSTOR's "bulk access" program; that program is quite new and didn't exist at the time that Aaron is accused of downloading the documents at MIT. It's also discretionary -- only research that JSTOR finds useful or appealing will be allowed to participate -- and apparently comes with onerous restrictions on use and redistribution.

    The website Aaron helped found (it's a matter of some dispute if he's "co-founder" or simply a very early employee compensated significantly in equity) is called Reddit.

    JSTOR supposedly got the content back from him. I find that hard to believe.

    Indeed, I found that statement amusing too; it doesn't make much sense to "get content back" in the context of digital data. This is part of the disconnect between legacy models of information (storage is hard; duplication is expensive; transfer and physical posession is important) and the modern digital reality (storage is easy; duplication is so cheap it's impossible to avoid; transfer is trivial and physical posession is largely irrelevant).

    Now, according to RWW, someone else has shared a much smaller portion of JSTOR content to PirateBay. Yes, the materials themselves are out of copyright; however, they were most certainly not obtained legally if downloaded from JSTOR.

    What do you mean by "obtained legally"? Without a copyright interest in the underlying data, it's very difficult to construct a legal theory whereby mere distribution of digital files infringes anyone's rights. (The "copying is so cheap that it's impossible to avoid" feature of modern digital systems, again). The Supreme Court have emphatically dismissed the idea that merely collating facts gives rise to a copyright interest, and Congress hasn't passed a database copyright law AFAIK.

    I have no idea what happened so that Greg Maxwell ended up in posession of copies of a significant fraction of the legacy JSTOR data, but there are many, many scenarios in which he did not violate any terms of service, "access" a "protected system" (to use the CFAA terminology), or in any way inconvenience JSTOR. Given that these papers are the canonical example of why the public domain exists, and why the Constitution empowers Congress to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right, it seems to me that Greg's publication of these papers falls squarely within all our shared goals.

    I'm curious why you appear to be having such a viscerally negative reaction to Aaron and Greg's actions. Is JSTOR an especially well-regarded institution in the library space? I'd never heard of them before this kerfluffle (my scholarly work was in computer science and mathematics), so I'm honestly at a loss.

    I can certainly understand some uncertainty in the library profession given the quite disruptive effect that digital technology and the collaborative models enabled by it are currently having on librarian's gatekeeper status as regards public access to information; libraries and librarians provide many critically important social functions, and the fact that digital distribution is removing the original purpose for the physical storage function of libraries does not mean that the other valuable functions of libraries and librarians has gone away.

    Again, thanks so much for participating in this conversation!

    • Christina Pikas says:

      I'm well aware of the PACER thing. You might not know that I've actually met the people who run PACER. The US Courts are mandated by law to have a cost neutral system. The calculated price per page method may not be the only way to do it, but it's the way they figured it out. Compare to GPO and NTIS charging for government documents.

      I'm also well aware of text mining and other uses for large data sets. Most publishers are aware of it, too, and if you talk to them you might be able to work something out.

      There are lots of ways "mere distribution of digital files infringes anyone's rights." You can deprive someone of property by hacking for one obvious example.

      Viscerally negative reaction? Hell yes. Read Eschenfelder's post. Learn what JSTOR is. This isn't the evil empire (not that it would be any better if it were).

      As for your assumption that as a librarian I'm not comfortable with the changes digital information has brought - dude, read my blog back to its beginning in 2004 before you say you know me. You don't know me and you don't know information policy.

  • Drugmnky says:

    Dude. Mega banks "shouldn't" have all the cash while babies starve, either. Specially when they engage in all sorts of illegal shit. Doesn't make bank robbery justified.

  • "Yes, the materials themselves are out of copyright; however, they were most certainly not obtained legally if downloaded from JSTOR." -- Could you explain?

    • Christina Pikas says:

      Your point is that breaking the license/terms of service is a contractual thing and not a criminal thing? Or - and I have no idea about this - circumventing technical protective measures (DMCA - like) only applies to DVDs?