I was just stretching my legs and ran across the 24 August 2010 issue of EOS*, the member publication for AGU. On the front cover was this article:
Parsons, M.A., Duerr, R., and Minster, J.-B. (2010) Data Citation and Peer Review. EOS 91 (34), 297-298.
It's a decent article describing the issues also brought up in an AGU Town Hall and elsewhere (h/t Joe Hourclé). According to the authors, the AGU Council "asserts that the scientific community should recognize the value of data collection, preparation and description and that data 'publications' should 'be credited and cited like the products of any other scientific activity.'" The thing is that data centers ask for different types of citation: cite the data set (use citation that is provided), mention the data source somewhere in the text, or cite some journal article discussing the gathering of the data**. The council also calls for peer review - but who knows what that means? It could mean data accuracy or just that the metadata correctly describes the data. The authors also suggest that proper citation standards must be figured out if we really want people to share their data. They point to the IPY model as an example. It has some strengths like having editors and a DOI, but there are still issues about when data are scientifically identical even if not in the same format and granularity. The authors continue with a section on peer review - but that seems a bit more dicey to me.
I think we're kind of getting to a turning point here - at least in a couple of fields. Gathering the data can be a full time job and these people need to get tenure, promotions, and grants. Sharing data is incredibly important, but who wants to share when a) they don't get credit for gathering the data and b) they don't get credit for sharing the data. People who want to use existing data sets might have issues finding them and also need a standard way to give credit for them. We also need better linking from the journals to the data and vice versa (see the several efforts to assign DOIs - very promising). If you're reading the journal, you need to be able to find the data (particularly if supplemental data go away).
All of these things are just getting more and more obvious and more important. The answers have to come from the various communities, although it would make sense if one community learned from another. It would also be great if the research databases linked to the data, too. I'm pretty sure ADS does this, but others might want to consider now how they would do it in addition to just having the citation.*** Oh, and yeah, citation managers should learn to deal with data sets, too!
*institutions still can't get this online, and we get like 3 issues at a time, 3 weeks late, but anyway.
** I have a real issue with this - I used a software package for R and the author wants some article cited that has nothing to do with my work or the development of the package. Grrr.
*** journal platforms do, but I mean indexing and abstracting services.