## Making things besides journal articles available

Communication is central to science and the vast majority of it happens outside of peer reviewed journal articles. Some informal scholarly communication is intended to be ephemeral but in the past couple of decades, more informal communication is conducted in an online text-based medium in a way that could be captured, saved, searched, and re-used. Often, it isn't.

Libraries have always had a problem with gray literature. Unpublished dissertations, technical reports, conference papers, government documents, maps, working documents... they are one level of difficult to find. Some say, "well, if it's good information it will be in the journal literature" or "if it's worth saving, it will be in the journal literature." But we know that: details are left out of the methods sections, data are not included, negative results are under reported, etc. In some fields conferences are as easy to find as journal articles whereas in other fields they're impossible (and some of that is due to the level of review and importance of the communication to that field).

Practically, if you get the idea for something from a blog post, then you need to attribute the blog post. If the blog post goes missing, then your readers are out of luck.

This is all in lead up to a panegyric on the efforts of John G. Wolbach Library of the Harvard-Smithsonian Center for Astrophysics with ADS, and particularly  Megan Potterbusch, Chloe Besombes, and Chris Erdmann who have been working on a number of initiatives to archive and make this information available, searchable, and citable.

Here is a quick listing of their projects:

Open Online Astronomy Thesis Collection, https://zenodo.org/communities/about/astrothesis/

Information about it is here: http://www.astrobetter.com/blog/2016/04/11/an-open-online-astronomy-thesis-collection

Even if your dissertation is in an institutional repository and is available from the university, this will make it more easy to find. Also, you can link to your datasets and whatnot.

We have folks who have been very dissatisfied with the existing options for hosting conference proceedings. I know one group went from AIP where they had been for decades, to Astronomical Society of the Pacific, to IoP and still weren't happy. They wanted to make the information available but not super expensive. This may be an option for long term access and preservation.

Informal astronomy communications: https://github.com/arceli/charter

This is more for like blog posts.

Research software: https://astronomy-software-index.github.io/2015-workshop/

All of this is pulled together by ADS (see also ADS labs), which is a freely available research database for Astro and related subjects (we are more interested in planetary science and solar physics at MPOW). PubMed gets all the love, but this is pretty powerful stuff.

Comments are off for this post

## ACS and Just Accepted Manuscripts

A colleague posted on Chminf-l asking about the American Chemical Society's Just Accepted Manuscripts program. Most of the immediate responses were to explain the program, which is not what she asked. Here's the site's description:

"Just Accepted" manuscripts are peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society is posting just accepted, unredacted manuscripts as a service to the research community in order to expedite the dissemination of scientific information as soon as possible after acceptance. "Just Accepted" manuscripts appear in full as PDF documents accompanied by an HTML abstract. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). The manuscripts posted on the "Just Accepted" Web site are not the final scientific version of record; the ASAP (As Soon As Publishable) article (which has been technically edited and formatted) represents the final scientific article of record. The "Just Accepted" manuscript is removed from the Web site upon publication of the ASAP article, and the ASAP article has the same DOI as the "Just Accepted" manuscript. The DOI remains constant to ensure that citations to "Just Accepted" manuscripts link to the final scientific article of record when it becomes available.

The FAQ explains that this is opt-in and these copies will be removed when the ASAP and final versions are live.

Chemistry is kind of a funny field when you talk about scholarly communication and sharing (see and read everything from Theresa Velden's dissertation research on this, in particular). Journals are dominated by ACS with RSC and the other scholarly publishers following. In some areas like synthetic chemistry, there's a real reluctance to even share at meetings, no desire to post pre-prints, and tight control over data access. In more computational and analytic areas, it's a little more relaxed.

Pre-print server efforts in chemistry have been mostly unsuccessful. For one thing, the journals will not take articles posted elsewhere first. Second, there's this big tension with priority (now moving to first to file maybe will change patent things but there's still recognition issues).

With all that, there are still efforts to require self-archiving broadly across fields and to have disciplinary pre-print servers. The big publishers who are rolling in dough from the subscriptions from all the ACS accredited programs do not want to see these archives and self-archiving succeed, even though it's been shown that it doesn't harm subscriptions in physics.

Anyway, as I said on the list, this is a pretty smart move by ACS. It solves the problem of getting the science out there sooner, but still with peer review, and on the hosted platform. This version disappears and the doi points you to the official version when available so they keep the traffic in house. I'm sure the embargoes go from official publication, too, so this is more time the publisher has to disseminate the content and get attention before government funders and institutional repositories can share it.

I think it will be accepted by chemists because it is from ACS and it is after peer review. We'll see, though, if there are any typos and whatnot that offend people.

Edit to add: Thurston Miller points to a few viewpoint papers in Journal of Physical Chemistry Letters on OA (the papers themselves are not OA).

## Misunderstanding ArXiv

The physics, astro, CS, math and related fields e-print server, ArXiv, is often misunderstood and misrepresented. Specifically, it's often represented as a place anyone can send any article in any state.

Anyone: users must be endorsed by another user. Endorsers are active submitters in the same area. This could be a fairly low bar, but it is there.

Any article: articles can be rejected or reclassified. Articles are expected to be journal quality. There are moderators to make these calls.

Any state: articles are supposed to be done. Read this interesting discussion on AstroBetter. Even if the rules don't say it, the norms in one of the subject areas might.

## Another tilt at a holy grail: identifying emerging research areas by mapping the literature

Technology surprise, disruptive technologies, or being caught unaware when managing a research portfolio or research funding are some of the fears that keep research managers and research funders up at night. Individual scientists might see some interesting things at conferences and might keep a mental note, but unless they can see the connection to their own work, will likely not bring it back. Even if they do, they might not be able to get funding for their new idea if the folks with the bucks don’t see where it’s going. Consequently, there are lots of different ways to do technology forecasting and the like. One of the main ways has been to mine the literature. Of course, as with anything using the literature you’re looking at some time delay. After all, a journal article may appear three years or more after the research was started.

I’ve been party to a bunch of conversations about this and I’ve also dabbled in the topic so I was intrigued when I saw this article in my feed from the journal. Plus, it uses a tool I’ve had a lot of success with recently in my work, Sci2.

Citation: Guo, H., Weingart, S., & Börner, K. (2011). Mixed-indicators model for identifying emerging research areas Scientometrics DOI: 10.1007/s11192-011-0433-7

The data set they are using is all the articles from PNAS and Scientometrics over a period of thirty years from 1980 to 2010. They’re using the information from Thomson Reuters Web of Science, not the full text of the articles.

Their indicators of emerging areas are:

• Addition of lots of new authors
• Increased interdisciplinarity of citations
• Bursts of new phrases/words

This differs from other work from like Chen and Zitt and others that cluster on citation or co-citation networks and also look at nodes with high betweenness centrality for turning points.

The addition of new authors per year is pretty straight forward, but the other two methods deserve some description. For disciplinarity, each cited article is given a score based on its journal’s location on the UCSD map of science. Then a Rao-Stirling diversity score is calculated for each article to provide a interdisciplinarity of citations. For each pair of citations in the reference list, the probability of the first being in a discipline, the probability of the second in its discipline and then the great circle distance between the two disciplines are used for the score (the map is on a sphere hence not Euclidean distance). The limitations are pretty numerous. First the map is only journals, only 16k journals, and only journals that were around 2001-2005 (think of how many journals have come out in the last few years).  Articles with more than 50% of the citations not going to things on the map were dropped. They mentioned areas with a lot of citations to monographs, but I would think the bigger problem would be conferences. Newer research areas might have problems finding a home in established journals or might be too new for journals and might only be appearing at conferences.

For word bursts, I know from using Sci2 that they’re using Kleinburg’s (2002, $,free pdf on citeseer) algorithm, but I don’t believe they state that in the article. Their description of their implementation of the algorithm is here. I’ve been curious about it but haven’t had the time to read the original article. In general, this article is pretty cool because it’s completely open science. You can go use their tool, use their dataset, and recreate their entire analysis – they encourage you to do so. However, I’m not completely convinced that their areas work for detecting emerging research areas given the plots of one of their methods against another. Their datasets might be to blame. They look at two individual journals so any given new research area might not catch on in a particular journal if there’s a conservative editor or something. PNAS is a general journal so the articles probably first appeared in more specialized journals (they would have to be big time before hitting Science, Nature, or PNAS). Also, the interesting thing with h-index and impact factor (the two emerging areas looked at for Scientometrics) is not their coverage in the information science disciplines, but h-index’s emergence from the physics literature and the coverage of both in biology, physics, and other areas. If you look across journals, your dataset quickly becomes huge, but you might get a better picture. Impact factor was first introduced decade(s) before the start of their window, but socially it’s become so controversial because of the funding tied to it and promotion and tenure tied to it – I believe a more recent phenomenon. Coding journals by discipline has often been done using JCR’s subject categories (lots of disagreement there) but others have mapped science journals by looking at citation networks. It would be computationally extremely expensive, but very cool to actually have a subdisciplinary category applied to each article – not taken from the journal in which it was published. Also, someone really needs to do more of this for areas with important conference papers like engineering and cs. ## scio11: blogs, bloggers, and boundaries Blogs, Bloggers and Boundaries? - Marie-Claire Shanahan, Alice Bell, Ed Yong, Viv Raper AB: arsenic life – nasa spokesperson – not engage without peer review. But this is really a spam filter issue – too tight you miss things you want to hear, too loose and you’re overwhelmed by the junk. The scientists need some sort of filter. Or think of science as a map (a la Gieryn). The world benefits from specialization – we can’t all be specialists in everything so we rely on trust and on shortcuts to navigate different spaces. Boundaries keep people out but also create shared spaces inside. Jargon and in jokes. Jargon makes in groups, in jokes are an expression of friendship but also make people feel left out. How do we calibrate our spam filter. Be clever about boundaries – who’s on the other side, do we want to speak to them? are the boundaries intentional EY: strength of the authors conclusions are dependent on the community’s ability to explain differently. example foxes and magnetic north – ways to test and ways to interpret data. Blogged about half male/female chickens. Got contacted by a farmer who has such an animal – connected him to the scientist. They formed a research collaboration across two continents. The farmer said contacted EY bcs detailed but understandable information found through web search. He does a who are you thread on his blog every year- and this helps him determine what to write and how to write it. M-CS: boundaries > boundary layers in fluid and what degree of mixing and places where different fluids come together. blogs give us many different kinds of boundaries: information, people, people: using blogs as an information source in a one way communication. her elementary education students didn’t think of blogs as conversational. newspaper comments – speaking to each other, no mixing, VR: some more practical tips – instead of defining terms repeatedly in text, use a mouseover plain language glossary. picking terms that are sensitive to the audience you’re looking for. Had her mother read her blog – learned a lot. EY: – you can alienate someone because you’re not in their very specific area of science – your language will be different from theirs (gave an example from the oil spill with a geophysicist and engineer – maybe on an agu blog?). you may be reaching a much smaller audience than you think from the audience – blogs are intimidating to some people. blogging is a subculture in and of itself even if you write for a general audience, commenters might be very sophisticated and use jargon and poke holes in things and intimidate other commenters. how to keep and engage people who stumble across your blog when searching for answers to a specific question? different crowd following DSN on the facebook page than at the blog itself. younger. ey had the same experience – comments on facebook when none on blogs, but they’re often reacting to the title or at most first paragraph so you have to be more careful with titles and ledes. and ran out of battery Comments are off for this post ## More on the impact of old folks in academe Jan 05 2011 Published by under STS My last post (I am so far from productive it's not funny... maybe I should be put out to pasture, lol) was a research blogging review of an article evaluating the claims that the graying of academia will impact productivity. DrugMonkey just brought an NSF report on a related subject to my attention. Hoffer, T.B., Sederstrom, S., & Harper, D. (2010) The End of Mandatory Retirement for Doctoral Scientists and Engineers in Post Secondary Institutions: Retirement Patterns 10 Years Later. InfoBrief (NSF11-302). http://www.nsf.gov/statistics/infbrief/nsf11302/nsf11302.pdf • The age of retirement is edging up, but hasn't jumped dramatically as maybe some would have expected. • There's also interest in the interaction with type of institution (by Carnegie classification). Research universities had lower retirement rates for each age group, but the difference wasn't statistically significant. • Disciplinary area also didn't seem to make a difference with the exception of bio, ag, health sciences which had a drop in retirement rates from 39.8% to 33.7% from 1993 to 2003. • Men do retire earlier than women and the spread is widening. Comments are off for this post ## Are the old folks holding us back? Dec 27 2010 Published by under scholarly communication, STS We've been hearing a lot about how hard it is to get a tenure track job - arguably harder even than it was during other economic recessions. We've also been hearing about how the age of NIH PIs is going up. I guess the age at first award is going up as well as the average. At the same time, there are a lot of stories about how the most disruptive ideas have come from young scientists. Also that young scientists are more productive. So if academia is aging and older scientists are not leaving, does this mean there will be an accompanying loss in productivity? That’s what this article looked into. It’s a review of the research on many different aspects of this question: “whether there is empirical support for the belief that age is negatively related to scientific achievement” I never realized that there had been mandatory retirement at universities in the US. This of course is discrimination (violated the amended Age Discrimination in Employment Act), so was abolished, but only in 1994. What with baby boomers and all not retiring, you can see how the average age would be going up and how there would be less room to hire young faculty. Many colleges apparently try (tried?) to incentivize early retirement but maybe only with limited success. The first thing the author looked at is if there is an association between age and scientific achievement. He proposes four factors: • changes in the cognitive abilities due to age • changes in motivation • availability of resources • legal curtailing (compulsory retirement where and when exists/existed) As for cognitive ability, it’s interesting the author goes back to research published in the 1950s which seems to set the stage. There were cohort issues and a lot depends on the way these abilities are tested. Large longitudinal studies seem to find that abilities do decline but fairly slowly until about age 80. There are also suggestions that, after being in a specific line of work for a while, you get in a rut and you don’t think of new or creative ways to solve problems. But this is something you could fix by changing approaches or adopting a new research area. As for motivation, there are some economic models based on anticipated life income that would indicate a drop in motivation. On the other hand, if scientists are more motivated by prestige and the rewards and recognition that come from scientific achievement, they should be more motivated to continue publishing as they go along. Likewise, if they have intrinsic motivation and enjoy their work, there’s no reason that should drop off. As for availability of resources, the work of the Coles and of Price found that the top researchers account for the majority of the publications – it’s a standard long tailed distribution. Price’s law is: if there are k researchers in a field, $sqrt{k}$ will be responsible for 50% of the contributions. Merton’s Matthew Effect is that the rich get richer. So if you’re successful, then you’ll get more rewards and resources which facilitates more and better science. Success breeds success, but failure leads to more failure. Aging scientists who have not been successful will most likely drop in productivity. As for compulsory retirement – most of these studies date back to when that was the case. Also, in some countries, older scientists can’t start big new projects a few years before retirement so they actually start winding down early. The author then goes on to review the evidence. Indicators of achievement include prizes and awards, citation count (no mention of the h-index, but that would make sense), number of publications, number of grants, etc. There are always problems with the research designs in these studies. They don’t account for the age distribution of the population of scientists when looking at the age of award winners. There are cohort effects – different groups of scientists use different methods and have different publication patterns. There are also period effects – there might be some historical situation that increased or decreased publication during a period in the author’s career. A bunch of the older studies find a sort of downward curve effect. The awards, publications, citations all seem to peak around the 40s and then drop off. These studies do show a decline in productivity with age, but age seems to account for only a small part of the variation. Past performance is the best predictor and the quality of the work doesn’t seem to decrease. For math, though, the story is quite different – it’s a flat line for productivity and age. In newer studies (there are only four here), the line is basically flat with a slight rise at 11-15 years into the career and another at 26-30 years into the career. The author didn’t find evidence to support the idea that the graying of academia will lead to a loss in productivity. Additional longitudinal studies are needed, but they will be difficult to perform for the reasons already listed as well as the changes in publication norms. Most universities are pushing very hard for their faculty to publish so the number of publications will continue to rise. So it seems crazy to force productive scientists to retire, but it’s complicated to get unproductive but highly paid scientists to retire while keeping the productive scientists. If no one retires, the older scientists have pretty high pay and there will be little room for the pipeline. It’s an interesting article and very readable. Recommended. Reference: Stroebe, W. (2010). The graying of academia: Will it reduce scientific productivity? American Psychologist, 65 (7), 660-673 DOI: 10.1037/a0021086 ## NASA can’t have it both ways Not to anthropomorphize a government agency or anything, but NASA is really confused in their social media actions. I’m the millionth person to point this out but it seems worthwhile for me to do so if for no other reason than to be able to find the information later by searching my blog. NASA has had the policy and practice (and mandate?) to share their science with “the public.” The public being US taxpayers, but also related scientists worldwide, children, and lots of other groups. They do this through websites and tv shows and more recently podcasts, blogs, and twitter. They publish scientific findings in scholarly journals, present them at meetings, and share scientific data freely through many different archives. Organizations that receive funding from NASA are required to do the same.* NASA typically does a pretty good job of this – partly because their stuff is so very fascinating that it would be hard not to have a cool and interesting message about it but mostly because they have lots of professional communicators, outreach professionals, and experienced scientists who work hard at it. With that said, what on earth (or in space, ha!) are they thinking in this reaction to Dr. Redfield’s evaluation of their recent microbiology/arsenic research? David Dobbs has a good blog post describing this. Dobbs quotes From “NASA’s arsenic microbe science slammed,” at CBC News: When NASA spokesman Dwayne Brown was asked about public criticisms of the paper in the blogosphere, he noted that the article was peer-reviewed and published in one of the most prestigious scientific journals. He added that Wolfe-Simon will not be responding to individual criticisms, as the agency doesn’t feel it is appropriate to debate the science using the media and bloggers. Instead, it believes that should be done in scientific publications. My immediate concern isn’t whether the science is good or that the criticisms are valid, but certainly if NASA intends to engage with the “public” and not just broadcast to us, they need to respond to these criticisms. Further, these responses should be in an appropriate manner – a blog post or a comment on Dr. Redfield’s blog. Dr Redfield’s blog is well known and well-respected and she registered the post with ResearchBlogging. Her comments section is also very informative. I agree that NASA shouldn’t necessarily be expected to engage on all fronts with people linking to their work, but as Dobbs says, this blog is different. Moreover, this paper is being reviewed on many blogs by scientists who are expert in this field and adjacent fields, and has been reviewed on F1000 (some links from Code for Life blog). If you have a press release on a paper, then you should be prepared to continue the engagement after you have broadcast your message. The paper’s author has also stated that replies should be in a “scientific venue.”** My dear scientist, the web is a scientific venue! Haven’t you heard? This is the #altmetrics or post-publication peer review we’ve been talking about for quite a while. Interestingly, some of the comments on the original post by Redfield basically indicate that responding on blogs is only for those who don’t have standing or who are not qualified. Grrr. That person needs to be educated! (I do hope that the technical comment or whatever that is eventually sent to Science attributes some credit to the commenters of that thread – a lot of good stuff there). Update: Randy left a nice comment (thank you) which caused me to go and look at updates on the Guardian site. This caught my eye: "'Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,' wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. "The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner." So the proper way to engage in scientific discourse is to hold press conferences (2 now)? Gosh, maybe I should toss my entire dissertation because I've been witnessing scientific discourse at conferences, in conference hallways, on twitter, in blogs, on wikis, on post-publication peer review sites... Hrumph. * One more time for the record. My place of work of course (google me) gets money from NASA. This post is my opinion only and does not reflect that of my place of work or any of the employees there. This post is purely from the point of view of an observer of scholarly communication. ** Do note that I am American. I put my . in my “” Canadians, Australians, and Brits for some crazy reason put it outside. ## The role of trust in science Dec 06 2010 Published by under STS Egon Willighagen just posted that Trust Has No Place in Science. His point is that Antony Williams asked if/how much people trust various chemical databases and Egon answered that he doesn’t trust any of them, he verifies. Ok. So back to the old standard, Mertonian norms. It is a norm of science to practice organized skepticism. (Merton argued that this wasn’t skepticism about everything, but specific to scientific ideas and statements – this specifically isn’t about religion or patriotism). Scientists don’t believe things just on someone’s word, they need evidence. Right. But what form does that evidence take? Egon says he verifies everything. So I guess there’s no need for the database then? I mean, if you’re going to re-run all of the experiments that provide the data. Or even read through the methods sections of the journal articles carefully. But wait. How can you believe the methods section of journal articles. Articles have been retracted for things wrong in the methods section. Even if you read through the methods section, how do you interpret the results? How do you know it was the right method to use? Well, then you must do all of the experiments that lead up to developing that method. And then you better redesign the instruments. Oh, and make your own reagents or whatever. At some point you have to evaluate things, but then move on. Philosophers will give you careful descriptions of it, but there’s the idea that you need some assumptions for every empirical test (Duhem-Quine). Then there’s whole areas that you’re not expert in that you need co-authors to support you in. Are you going to check all of their work and second guess them? There is some trust – skepticism – but trust. (nah, I’m not out of my blogging funk and this isn’t up to my standards, but I need to at least try to blog to get back on the horse 🙁 ) update: spelled Antony's name wrong! sorry. ## ASIST2010: Structure and evolution of scientific collaboration networks in a modern research collaboratory Alberto Pepe, UCLA (dissertation award winner) 🙂 hasn’t looked at his dissertation for 4 months so he might be rusty, ask his advisor look at collaboration in a collaboratory where work is interdiscplinary and distributed. Physical and virtual spaces… At CENS, sensor network research, about 300 researchers, multidisciplinary (ee, cs, stats, bio, environ sci, urban planners, sociology, media). Multi-sited within southern California. He was a participant observer. Important to study relations instead of things bcs of connectedness of sci collabs. Looked at co-authorship, communication (mailing lists), acquaintanceship What is the topology of the network, what is the structure of CENS and how has it evolved, how are the networks related to each other – what is the role of communication and acquaintanceship on co-authorship? Authorship data – CENS annual reports are the official listing of all of the publications from the collaboration (lucky this exists – doesn’t for many orgs). 600 publications over 7 annual reports. 400 conf papers. Yr 2 was most productive, dip yr 5 then rise. Most papers have 2-3 authors. Hard to define the boundaries of the system are since there is co-authorship across institutions, countries – used co-authorship to draw boundaries. 87 mailing lists, 30k e-mails, 1500 threads. Social survey: who do you know? (opt out – each has a name and picture from a public database…. 300 people from co-authors). How do you know them? When did you meet, how often do you communicate? 10-30 acquaintances – 191 of 373 responded to the survey. he didn’t look at betweenness (or apparently like Bonaicich or eigenfactor) Community structure: Used Newman-Girvan method. (would have been nice if he had drawn a line around them or something – hard to see – as he did on a later slide) some communities around country of origin, academic affiliation, position (staff,faculty,phd). co-auth and acq overlap and are one institution and one discipline (i think he said) but are becoming more interdisciplinary and less inter-institutional cens collab communities are open fluid, inclusive, and small-worldish (but not based on prestige). hubs bring different communities together not as interdisciplinary as we have heard these q occurred to me: the survey – ethical issues with opt out? ethical issues with pictures? did they shift the order? order alphabetical? fatigue? sort in groups a: they were ordered by institution and discipline, tried other things but this was what worked In CS they co-author without knowing each other !?!?! that’s cool. In bio they know each other if they co-auth wrt ethical questions – director of CENS was on the dissertation committee so was all approved. Comments are off for this post Older posts » •  Support level Reader :$5.00 USD - monthly Supporter : $10.00 USD - monthly Sustainer :$25.00 USD - monthly Angel : \$1,200.00 USD - yearly
• Scientopia Blogs

• I'm a science and engineering librarian and information scientist at a university-affiliated research center. I have a BS in Physics, an MLS, and a PhD in Information Studies. Nothing here represents my employer.

• ## Contribute to Scientopia

Click here to toss some change in our contribution jar. All funds donated will be used solely to support Scientopia's operating costs. See our Code for more information on our funding policies.
• Where am I?

N 39 W 76

Research Blogging Awards 2010 Finalist