Archive for the 'open science' category

ASIST2017: Making A Case for Open Research: Implications for Reproducibility and Transparency

Came in super late (darn traffic - left home, 30 mi away, 2 hours before getting here)

Caught the end of Erik Mitchell and Edward M. Corrado - they did a survey of JASIST authors and the responses were bleak. Suprising only like 25% or so had an IRB? Few shared data or had data management plans. Few shared code. Few really did that much about open access.

John M. Budd: A retraction walks into the bar. Bartender says: what will you have? Retraction says: Nevermind. and doesn't leave the bar.

Retractions - lots. And lots of things that had been cited, the citations were substantive. Marking of retractions is poor. Work to be done and presented next year

Audience discussion:

Q: Is anxiety an issue? some researchers have been attacked for sharing data.

A: Well in qualitative, it isn't appropriate to really talk about reproducible

A: We didn't see this anxiety in our work, but maybe a qualitative study would

Q: Question about the result that said lack of consent was a reason not to share. Audience member was member of a project that had to go through specific consent forms to see if data could be used for new protocols.

(fwiw, I did actually reveal names of my participants in my dissertation research but I went back and re-asked consent giving examples of how it would be done)

Q: IRBs or research sites requiring destruction of data

Q: Works at a DOE national lab -and they have strict requirements for DMPs. Isn't that going to be more the norm now with funder requirements

A: Not evenly held accountable. Different agencies coming online at different points.

A: Someone from DOT - we're just now having funding calls that have this requirement. There are new requirements for PII data and DMP. They are part of the compliance chain at the National Transportation Library. They haven't gotten any data back in for it yet. It will be that you will be ineligible for future funding if you do not provide identifiers (this might be part of one contract broken into blocks). Many if not most large funders of science - Gates, Wellcome, other funders requiring.

Q: if I do a qualitative study of how people like riding buses, would the interview transcripts be deposited and available?

A: Yes - but probably some sort of de-identification, compilation, anonymization, etc. (I added this part).


Comments are off for this post

RIP Jean-Claude Bradley

May 14 2014 Published by under open science

Antony Williams posted this message to CHMINF-L this morning:

“Dear Members of the Drexel University Community,

It is with deep sadness that I inform you of the passing of Jean-Claude Bradley, PhD, associate professor in the Department of Chemistry.

Jean-Claude joined Drexel as an assistant professor in 1996 after receiving his PhD in organic chemistry and serving as a postdoctoral researcher at Duke University and College de France in Paris. In 2004, he was appointed E-Learning Coordinator for Drexel's College of Arts and Sciences, helping to spearhead the adoption of novel teaching modalities. In that role, he led the University's initiative to buy an "island" in the virtual world of Second Life, where students and faculty could explore new methods of teaching and learning.

Jean-Claude was most well known for his "Open Notebook Science"(ONS), a term he coined to describe his novel approach to making all primary research (including both successful and failed experiments) open to the public in real time. ONS, he believed—and demonstrated—could significantly impact the future of science by reducing financial and computational restraints and by granting public access to the raw data that shapes scientific conclusions.

"...In the past, trusting people might have been a necessary evil [of research]," Bradley said. "Today, it is a choice. Optimally, trust should have no place in science."

In June of 2013, Jean-Claude was invited to the White House for an "Open Science Poster Session," at which he discussed ONS' role in allowing he and his collaborators to confidently determine the melting points of over 27,000 substances, including many that were never before agreed upon. Currently, his research lab had been working to create anti-malarial compounds to aid in the synthesis of drugs to fight malaria. His lab's work on this project was made available to the public on a wiki called UsefulChem, which Jean-Claude started in 2005.

Jean-Claude's philosophy of free, accessible science translated to an open approach in the classroom as well. Content from his undergraduate chemistry courses was made freely available to the public, and real data from the laboratory was used in assignments to practice concepts learned in the classroom.

In an article in Chemistry World last April, Bradley said: "It is only a matter of time before the internet is saturated with free knowledge for all…People will remember those who were first."

Indeed, we will remember Jean-Claude as a pioneer in the open access movement, an innovative researcher and colleague, and a kind and dedicated educator. His death impacts all who knew him, and especially the students, faculty and collaborators who worked with him daily. For anyone who may need support in dealing with this loss, we encourage you to reach out to the counseling professionals at Drexel's Counseling Center at 215-895-1415 (or 215-416-3337 after regular business hours).

Our thoughts are with Jean-Claude's family and friends at this difficult time.


Donna M. Murasko, PhD
Dean of the College of Arts and Sciences”

I first met Jean-Claude at the very first North Carolina Science Blogging Conference - what became Science Online. He presented there on Open Notebook Science (my notes) and I was awed by his fearlessness in sharing so openly work that was still in progress. He had patents and lots of peer-reviewed articles, but he found an area in which openness could be most useful and did more than his part. He demonstrated his method far and wide and made Open Science a thing. He came and spoke at ASIST, too, which was very nice - a lot more skeptics in that audience (notes).

Anthony Williams has more memories on his blog.

Jean-Claude was generous and very nice, and maybe a bit shy or introverted. He made tremendous contributions to science and he will be missed.

2 responses so far

scio10: Science in the Cloud

Jan 16 2010 Published by under Conferences, open science

John Hogenesch, Assistant Professor of Pharmacology - Penn School of Med

gene-at-a-time is giving way to genome wide - larger datasets, collaborative research

last year more added to genebank than all previous years combined (wow!) - exceeds Moore's law.

Academia responds by buying storage and clusters - but you need great IT staff - and it's really hard to get and keep them (they go to industry), heating & cooling, depreciation, usage/provisioning (under/over utilized). Larger inter-institutional grids - access is tightly regulated, they are very complex to program in/for

Cloud computing: software as a service, infrastructure as a service, platform as a service

They use SAAS for collaboration - basecamp from 37 signals. Collaborating with multiple labs, multiple people. Compare $50/month with no IT support costs to sharepoint $1k server, $500 license, admin 5% effort $2k.

IAAS for proteomics - example - search complex samples over 6 frame translated genome. They provisioned 20 AWS nodes, running windows, conducted over 7 days at a cost of $1400.

In genomics - lots of recent publications using cloudburst, crossbow (?), and hadoop for blast/blat/r scripts....

BLAT on AWS - using CloudCrowd (NY Times alternative to hadoop), provisioned 20 large memory instances of ubuntu, 85% of sequences were mapped, ~72 hours/$424 (experiments cost $30k with machine and reagents and all - so over the course of the 30 you can do in a year, 600k savings)

q: how much programming to get it ready to go on AWS?

a: about 8 hours with a somewhat experienced programmer - a very experienced on could do it in 1-2hours - programming is done in Ruby

PAAS - aggregating clouds - genome wide screen for modifiers of the circadian clock , 300 found, (Zhang et al Cell, 2009), gene cetric data integration - go to each data site and search for your gene and then compile. ID/synonym resolution is hard. BioGPS - federated search of these gene sources - URL based scheme, extensible. Puts results from different sources in boxes on BioGPS. Has a catalog search so you can see if you can buy from Invitrogen (sponsor, thank you!) and others. (

PAAS use case - publishing in the cloud - Plos Currents Influenza. pmids used for references, google knol to write, moderators decide suitable/unsuitable - not review. PLOS will consider expanded versions in their pubs. ~52 publications so far. Example has been viewed 7k times.

q: biobase - only mammalian?

a: yes, but code is available (.net) so you could customize

q: small vs. large institutions - does this help people who are under resourced for equipment

with this we can give you the algorithm and then you could run it on the same service - so this is different from just sharing algorithms

q: writing grants etc. how does that go with cloud services?

a: capital costs (buying servers) is typically out of a different bucket so this might complicate. Some in the room have had success, no problems. Some have met skepticism. In the UK they're very concerned about the PATRIOT act provisions.

q: do you need an AWS specialist

a: they had someone with an MS in bioinformatics and a bs in bio - picked up how to do the first in a week, second done in 8 hours. Could probably replace that person fairly easily

q: concern with using a free service online - stability/preservation of data

a: test to see about getting data out after you set up an account, if super important then host on your own site

q: using these in teaching?

a: using wave, using pbwiki, using blackboard, using open wetware wiki, (i use OneNote), also googledocs (they tried wikis first, didn't fly, googledocs works well for them)

q: proportion of work done in cloud vs. local computing resources

q: boundaries of the institution

a: now either academic or industrial - so this will probably  allow independent investigators again, rent some lab time, rent some computing time and then prototype something. Can also use publically available data - always lots more things to find/use it for than just what originators foresaw

One response so far

Are chemists really grinches?

With well known and respected open science projects coming out of chemistry as well as cool tools like pubchem and emolecules... it seems a bit unfair of me to ask if chemists are grinches. But there has been and there continues to be a lot of study of data/information/knowledge sharing in chemistry - or, really, the lack thereof.  In general, pre-prints are not passed around or self-archived, there is very little data sharing (there are counter examples in crystallography), and details are withheld from conference presentations or the conference slides are not made available (Milo used to have a post on this, but I guess he's dropped from teh netz). These things might be quite striking to someone who is familiar with scholarly communication in high energy physics or bioinformatics.

Most recently, there is a commentary article out by Theresa Velden and Carl Lagoze [1] that summarizes things that are different about chemistry that probably impact the adoption and use of some of the newer and more open data/information/knowledge sharing - communication tools:

  • long tail science - elsewhere this is described as resource concentration, among other names. In other words, there are lots and lots and lots of small and medium sized labs funded from all sorts of different places.  If you're more familiar with physics, this is like comparing HEP (with a very few, very big experiments) with some of the bench-top optical research areas (like what Chad studies, I think). The idea is that if there are lots of little and highly competitive labs around there are disincentives to share. In the case of big science - some areas of astro, HEP - you *have* to share, it's built into the funding and access to equipment/data.
  • longevity of data - chemistry information doesn't go bad so quickly. It's sort of like math that way. With some areas of biomed, it's pretty easy to give away content for free when it's a year old. Immediacy is important there. Also, because there's a lot of data around, and it's all locked up in closed proprietary systems, there's a huge amount of inertia in trying to change anything.
  • diversity of research cultures in chemistry
  • proprietary information - big money closed databases. I have to say that I think ACS and CAS evolved the way they did because that's what the community thought it wanted. ACS is a member organization so... (oh, and AIAA and SAE have both said that it's the pub board made up of society members that are requiring their grinch-y behavior).
  • proximity to industry - or as Velden and Lagoze call it - the industry-academia balance. There are two pieces to this - the amount of work that's being done in industry and how sell-able the fruits of scientific labor are. Even in academic chemistry there's industrial funding and the push to patent and license stuff in chemistry. Industrial chemists might not publish as much and keep more information secret.
  • ACS and CAS's global dominance and iron fist control. (but they say that's not true - see pp60-1 in [2], "points of dissent"


But this discussion really builds on work that has been done over a number of years. Jeremy Birnholtz' dissertation [3] and JASIST article [4] studied collaboration propensity.  Resource concentration, agreement on quality, and the need for/availability of help were significant in predicting collaboration propensity.  Agreement on quality is also mentioned decades before by Zuckerman and Merton [5]. The acceptance rate of HEP journals is much, much higher than in fields like history because submitters know what is expected and the article gets reviewed within the lab before being submitted.  The vast majority of the pre-prints on ArXiv do end up published in peer-reviewed journals or conferences so it's pretty safe to use them. This might not be the case in other fields.

A whole bunch of articles on the adoption of e-mail, bulletin boards, listservs, etc., came out in the mid-1990s.  To sum these up, it's not just a matter of time [6], the practices and social features of the research area matter.

So, are chemists grinches?  It does appear that in many areas of chemistry, there is not a tradition of sharing.  In these areas, it's not a matter of the availability of appropriate technology, it's more related to how much individuals or labs need to share to produce scientific knowledge and how much they are concerned about being scooped. New researchers are brought up and trained to control their data - so it's not just a matter of time.

If chemists are grinches, but are happy that way, is there a need to change that?  Is there a need to tackle the Sisyphean task of culture change?  Well, chemistry information is not just for chemists. Physicists and biologists who have to deal with chemical information get pretty annoyed at the hassles. Should there be parallel systems of chemical information used by folks who like to share (maybe that's some of what is going on).

So I hope I've made some people angry enough to comment :) 



[1] Velden, T. & Lagoze, C. (2009) Communicating Chemistry. Nature Chemistry 1, 673 - 678. DOI:10.1038/nchem.448

[2] Velden, T. & Lagoze, C.(2009) The Value of New Scientific Communication Models for Chemistry. Retrieved November 28, 2009 from

[3] Birnholtz, J. P. (2005). When Do Researchers Collaborate? Toward a Model of Collaboration Propensity in Science and Engineering Research. Unpublished Doctor of Philosophy (Information), The University of Michigan. 3186579.

[4] Birnholtz, J. P. (2007). When do researchers collaborate? Toward a model of collaboration propensity. Journal of the American Society for Information Science and Technology, 58(14), 2226-2239. doi:10.1002/asi.20684

[5] Zuckerman, H., & Merton, R. K. (1971). Patterns of Evaluation in Science: Institutionalization, Structure and Functions of the Referee System. Minerva, 9, 66-100.

[6] Kling, R., & McKim, G. (2000). Not just a matter of time: Field differences and the shaping of electronic media in supporting scientific communication. Journal of the American Society for Information Science, 51, 1306-1320. doi:10.1002/1097-4571(2000)9999:9999<::AID-ASI1047>3.0.CO;2-T

2 responses so far

AGU experimenting with open peer review

Oct 13 2009 Published by under open science, scholarly communication

This was in an earlier EOS (pdf, not available online for institutional subscribers so I found this by flipping through the print!) - number 32 of this year from 11 August. They're trying what Nature tried and dropped and what EGU has been fairly successful with in Atmospheric Chemistry and Physics - although neither gathered/s many comments.
They're trying it for just a year and only for a few journals:

  • G-cubed
  • Global Biogeochemical Cycles (?)
  • JGR-Earth Surface
  • JGR-Planets
  • Radio Science

It's completely voluntary.Registration is required to comment. The formal reviews will be posted (may be anonymous), but the whole thing goes away when the article is published.
That last bit is a shame. If some of the comments are good and really constructive, it's a shame to toss them. The few geoblogosphere comments I saw were not impressed. I looked all over JGR - Earth Surface and even signed into GEMS and couldn't find any articles.

Comments are off for this post

Balancing open & collaboration with private & individual

Aug 02 2009 Published by under collaboration, open science

A quick note on the tension between sharing everything as quickly as possible and keeping things for yourself.

The thrill of collaboration when like minds come together to brainstorm and solve big problems and the egoboo of having something you created "liked" or reused should not exclude or overshadow the value of figuring things out for yourself and having something you can point to as your own.

Recent posts from Sabine and Cameron got me thinking about this a little more. There are also some excellent comments on Sabine's post.

I think it's important to go offline for a bit and to work things out for yourself. Certainly, if you're reading something in math or science, you might try to work through the problem on your own prior to reading how the authors say to do it.  I'm an extreme extrovert so I think by talking and writing (that's why you - like my husband - might be maddened by the apparent drift in my "convictions" or point of view). Others get the data, then go off somewhere and come back with an idea fully formed.  What seems like ages ago now, I proposed that blogs were good to help people of these two groups work together, but I wonder about the pace of friendfeed and/or polymath projects and the necessity to feed the beast.  How does that work for the introvert types?

Likewise with open science, perhaps, for theoretical scientists or for folks who need to go offline and then present ideas fully formed.  Having someone jump in to their thoughts and tell them that they made a misstep in their proof or to tell them the answer instead of letting them figure it out for themselves, might throw them off their game. 

Seems like for some projects the ideal limit of openness might not be real-time, complete, but at turning points when various milestones are passed... showing the work, but only after it's done and cleaned up.  What do you think?

3 responses so far