Credit for Infrastructure

It’s not a new thing – it’s an ongoing thing – you typically get credit in science for publishing scientific results (assertions of new knowledge) in peer reviewed journal articles. With big science, though, you’ve got scientists who work more on the infrastructure – which is also creating new knowledge, but sometimes encoded in artifacts – but are not doing (as much) science with the output of the infrastructure. With big data and data curation, you’ve also got data scientists.

Telescopes are a great example of this because you’ve got the organization building and running the “facility” and then scientists elsewhere using the data and getting grants to do things with the data.

So all of this is floating around in my mind when those of us on PAMnet were alerted (by CE) to a new article on arXiv:  and similar effort by ESO Librarians at “facilities” spend a lot of time collecting articles written using the data produced. This collection of articles is critical for securing and maintaining funding but also to help determine what other instruments are needed and as inputs for virtual observatories (See Uta & Jill’s presentation in pdf). But why is it so hard? To a certain extent the links back and forth still aren’t working. How many times have you seen something like “if this R package is helpful cite this completely unrelated paper.”

I’m also catching up on Science magazine podcasts, and on the April 23,2010 edition (pdf transcript), there was a discussion of the National Ecological Observatory Network, a $400M NSF effort. This level of effort may be more common in other areas of science, but is somewhat rare in ecology. Apparently there’s some chafing because the same data will be taken at multiple sites and there is little local control over what data will be taken. We know from research by Zimmerman* and others that ecologists use measures like trust in the data taker to determine if they will use the data. This way of taking data is apparently quite different and it will require new ways of giving credit to the data takers [**] as well as new ways of searching for information and determining relevance for the consumers of the data.

Answer? No. Just ponderings – better be off to work on the proposal. This is also a test of my blogging software so fingers crossed.

[*] Zimmerman, A. S. (2008). New knowledge from old data - The role of standards in the sharing and reuse of ecological data. Science, Technology, & Human Values, 33(5), 631-652.

[**] Like how the people who do flora do not get credit

4 responses so far

  • Joe Hourclé says:

    Christina, the issue of credit for infrastructure and for producing and sharing well-documented data was part of the provenance discussion[1] at the Earth and Space Science Informatics [2] workshop yesterday. Unfortunately, it was more acknowledgement of the problem (which we've known for some time) than actually coming to any solutions.

    There was a suggestion that all data sets be given some sort of identifier, and that the journals require the identifiers for the data being used in the research, but that doesn't solve the infrastructure problem, particularly as we build federated search systems and provide APIs directly to the visualization tools, they might not be aware of how many archives/services/etc were used to get them the data that they used in their research.

    There was also discussion specifically on citation of data [3] at the 2009 Fall AGU meeting, and an important comment came up: "Currently a person who publishes really good data is given less credit than someone who publishes a bad paper", as the paper can count towards tenure. (and if it's *really* bad, lots of people will cite it to say how bad it is)


    • Christina Pikas says:

      it's definitely an interesting problem - I would think the ESSI folks would be some of the ones to figure it out. Hey did you attend the session for which I am a co author 🙂 I haven't heard how that went yet.

    • Joe Hourclé says:

      I'm not sure if there's something that's keeping replies 3 deep from showing up, but I've tried replying to Christina's question twice, and it hasn't shown up.

      Yes, I saw the talk, because we don't have separate tracks (it's a small workshop, maybe about 35-40 people each day, with maybe 50 people total, as not everyone was there each day).

      If it was much larger, I don't think we would've been able to get away with as much discussion as we had; the talk that you were co-author on (Virtual Organizations and the Transformation of Data into Knowledge [1]) was the very first presentation (maybe the second, if you count the welcome/overview talk by Bob Weigel, who was the organizer). It might've spurred more discussion if people had been warmed up.

      (I came two presentations later for my talk reporting on my performance art/poster from AGU [2], but that was specifically intended to generate discussion, and I think I was up there for almost 40 min as people called me out for using the terms like 'us' and 'them', with 'them' to refer to the scientists.)

      Also, Michele didn't make it 'til Wednesday, so someone else gave the APL talk, which might've reduced people's comments. I know I mentioned that the Information Science field had been studying virtual orgs for years, so they should look into that research, but I didn't point out that 'GAIA' is a bad name for a project, as there's already so many projects with the name.


      • Christina Pikas says:

        it's supposed to allow comments 5 deep and with up to 4 links ... not sure what's going on. Sorry about that.
        Yep, Michelle told me Bob Schaefer gave the presentation. I was on the fence about attending but I decided not to. I think the critiques are good ones - gaining participation in this sort of thing isn't trivial. We've got another guy on our team who has done a ton of research in that, but it's still an up hill battle.