Linking journal articles to archived data (and vice versa)

I'm always complaining that it's all biomed all the time with new announcements and coolness from publishers - but I just read a press release from Elsevier (I know, I know) about a geosciences project they've got going (heard about this from ResourceShelf). In February, they apparently added automatic linking from data sets archived in PANGAEA to journal articles on ScienceDirect. They're now adding "a map to every ScienceDirect article that has associated research data at PANGAEA; it displays all geographical locations for which such data is available."

I'm not familiar with PANGAEA, but it stands for Publishing Network for Geoscientific & Environmental Data. I guess it's hosted in Germany. All of the datasets have DOIs and a citation.

In a comment on an earlier post on getting credit for infrastructure, Joe Hourclé pointed to an AGU town hall on data stewardship. This town hall meeting deserves a lot more attention since it covers many of the controversial topics - open data, access, curation, peer review, and citation. It also mentions PANGAEA in passing as an effort to support data citation. Seems like Europe is ahead of the US on this - and that's another post 🙂

One response so far

  • Joe Hourclé says:

    There was mention that a report from the AGU town hall will be published in an upcoming episode of EOS. There was also discussion this week at the ESSI workshop to try to get the journals to require authors to either make the data available or provide a citation of where to find the data that was used.

    Part of the problem, as I see it is exactly what it is that we're trying to cite -- saying that you used 171Ångstrom images from SDO AIA over a weeklong period is much different than saying that you specifically downloaded one reduced resolution (4x4 binned) Rice-compressed FITS files per hour, at the half-hour (UTC), from the caching mirror at NSO on a specific date for your analysis. Some of the citation standards point to the instrument or mission, others try to point to a given data collection (processed form of the data), but few deal with any filtering other than the extentsof the time period analyzed.

    I don't know that the PANGEA approach to having a general-use repository is going to work for all fields. Sometimes the data that gets used for research is in the tens of terabytes (and we expect volumes to keep growing); if it's coming from an "active archive", the edition that you downloaded on a given day might've since been replaced by some better calibration and isn't maintained. It's unlikely that we're going to store each and every edition (even just those used for published research). There are some groups looking into trying to store all of the information necessary to re-generate the data as it existed on a given date, with some going as far as storing images of virtual machines (OS & installed software) used to generate a given edition.