Archive for the 'scholarly communication' category

Making things besides journal articles available

Apr 18 2016 Published by under scholarly communication, STS

Communication is central to science and the vast majority of it happens outside of peer reviewed journal articles. Some informal scholarly communication is intended to be ephemeral but in the past couple of decades, more informal communication is conducted in an online text-based medium in a way that could be captured, saved, searched, and re-used. Often, it isn't.

Libraries have always had a problem with gray literature. Unpublished dissertations, technical reports, conference papers, government documents, maps, working documents... they are one level of difficult to find. Some say, "well, if it's good information it will be in the journal literature" or "if it's worth saving, it will be in the journal literature." But we know that: details are left out of the methods sections, data are not included, negative results are under reported, etc. In some fields conferences are as easy to find as journal articles whereas in other fields they're impossible (and some of that is due to the level of review and importance of the communication to that field).

Practically, if you get the idea for something from a blog post, then you need to attribute the blog post. If the blog post goes missing, then your readers are out of luck.

This is all in lead up to a panegyric on the efforts of John G. Wolbach Library of the Harvard-Smithsonian Center for Astrophysics with ADS, and particularly  Megan Potterbusch, Chloe Besombes, and Chris Erdmann who have been working on a number of initiatives to archive and make this information available, searchable, and citable.

Here is a quick listing of their projects:

Open Online Astronomy Thesis Collection, https://zenodo.org/communities/about/astrothesis/

Information about it is here: http://www.astrobetter.com/blog/2016/04/11/an-open-online-astronomy-thesis-collection

Even if your dissertation is in an institutional repository and is available from the university, this will make it more easy to find. Also, you can link to your datasets and whatnot.

Conference Materials: http://altbibl.io/gazette/open-access-publishing-made-easy-for-conferences/

We have folks who have been very dissatisfied with the existing options for hosting conference proceedings. I know one group went from AIP where they had been for decades, to Astronomical Society of the Pacific, to IoP and still weren't happy. They wanted to make the information available but not super expensive. This may be an option for long term access and preservation.

Informal astronomy communications: https://github.com/arceli/charter

This is more for like blog posts.

Research software: https://astronomy-software-index.github.io/2015-workshop/

 

All of this is pulled together by ADS (see also ADS labs), which is a freely available research database for Astro and related subjects (we are more interested in planetary science and solar physics at MPOW). PubMed gets all the love, but this is pretty powerful stuff.

 

 

 

 

 

 

No responses yet

Preliminary thoughts on longitudinal k-means for bibliometric trajectories

I read with great interest Baumgartner and Leydesdorff's article* on group based trajectory modeling of bibliometric trajectories and I immediately wanted to try it. She used SAS or something like that, though, and I wanted R. I fooled around with this last year for a while and I couldn't get it going in the R package for GBTM**

Later, I ran across a way to do k-means clustering for longitudinal data - for trajectories! Cool. I actually understand the math a lot better, too.

Maybe I should mention what I mean about trajectories in this case. When you look at citations per year for articles in science, there's a typical shape .. a peak at year 2-3 (depends on field), and then slacks off and is pretty flat. Turns out there are a few other typical shapes you see regularly. One is the sleeping beauty - it goes along and then gets rediscovered and all the sudden has another peak - maybe it turns out to be useful for computational modeling once computers catch up. Another is the workhorse paper that just continues to be useful overtime and takes a steady strain - maybe it's a really nice review of a phenomenon. There may be 5 different shapes?  I don't think anyone knows yet, for sure.

So instead of my other dataset I was playing with last year with like 1000 articles from MPOW, I'm playing with articles from MPOW that were published between 1948 and 1979 and that were identified in a 1986 article as citation classics. 22 articles. I downloaded the full records for their citing articles and then ran an R script to pull of the PY of the citing articles (I also pulled of cited articles and did a fractional Times Cited count but that's another story). I cut off the year the article was published, and then kept the next 35 years for each of the articles. It's like up to 2015 for a couple but I don't think that will matter a lot as we're a ways into 2016 now.

Loaded it into R, plotted the trajectories straight off:

trajLooks like a mess and there are only 22!

Let's look at 3 clusters:

3clustersOk, so look at the percentiles. 4% is one article. This is a very, very famous article. You can probably guess it if you know MPOW. Then the green cluster is probably the work horses. The majority are the standard layout.

Let's look at 4 clusters:

4clustersYou still here have the one crazy one. Like 5 workhorses. The rest are variations on the normal spike. Some a really sharp spike and then not much after (these were the latest ones in the set - the author didn't have enough distance to see what they would do). Others a normal spike then pretty flat.

So I let it do the default and calculate with 2, 3, 4, 5, 6 clusters. When you get above 4, you just add more singletons. The article on kml*** says there's no absolute way to identify the best number of clusters but they give you a bunch of measurements and if they all agree, Bob's your uncle.

qualityBigger is better (they normalize and flip some of them so you can look at them like this). Well, nuts. So the methods that look at compactness of the clusters divided by how far apart they're spaced (the first 3, I think?) are totally different than 4 - which is just like distance from centroids or something like that. I don't know.I probably have to look at that section again.

Looking at the data, it doesn't make sense at all to do 5 or 6. Does 4 add information over 3? I think so, really. Of course with this package you can do different distance measurements and different starting points, and different numbers of iterations.

What practical purpose does this solve? Dunno? I really think it's worth giving workhorse papers credit. A good paper that continues to be useful... makes a real contribution, in my mind. But is there any way to determine that vs. a mediocre paper with a lower spike short of waiting 35 years? Dunno.

 

*Baumgartner, S. E., & Leydesdorff, L. (2014). Group‐based trajectory modeling (GBTM) of citations in scholarly literature: dynamic qualities of “transient” and “sticky knowledge claims”. Journal of the Association for Information Science and Technology, 65(4), 797-811. doi: 10.1002/asi.23009 (see arxiv)

** Interesting articles on it. It's from criminology and looks at recidivism. Package.

*** Genolini, C., Alacoque, X., Sentenac, M., & Arnaud, C. (2015). kml and kml3d: R Packages to Cluster Longitudinal Data. Journal of Statistical Software, 65(4), 1-34. Retrieved from http://www.jstatsoft.org/v65/i04/

No responses yet

Thoughts on alternatives to Sci-Hub

There have been a lot of blog posts, news pieces, and listserv comments about Sci-Hub. Some have said that while they know it is wrong, they feel scientists have been forced into using the system because they have no alternatives for access. Some responses have been on the order of: we asked our favorite scientists at big US research institutions and they say they have access to everything they need so why don't you? or We give away articles to the very poorest of countries (who might not even be able to take advantage of because poor connectivity), so that should be enough (what about the middle range countries?). Or you make a lot of money and your university has an endowment, you surely can afford this journal and you're just stealing! Or Jean Valjean didn't have access to bread, either, but that didn't mean stealing was right!

Others have repeatedly countered the whole difference between stealing things (bread) and making copies that do not diminish the original (if possibly the market for it).

Anyhoo, what I really want to talk about here is the alternatives for closed access articles. Probably not an exhaustive list.

  • licensed access through your institution as part of a site-wide subscription (on campus, or via VPN/proxy from off)
  • interlibrary loan
  • license your own copy ($30-75)
  • individual subscription (through a society or just from the publisher)
  • "rent" access to view a copy for 24 hours
  • find copies self archived in institutional and disciplinary repositories, on their websites, and other random places
  • find copies illegally shared as part of course materials for another course (this happens for stuff I'm looking for pretty regularly, actually, particularly chapters from social sciences books)
  • contact author for copy
  • contact buddy, relative, etc., at another university to request
  • use walk up access at a local public university
  • use #icanhazpdf
  • use Sci-Hub

So let's look at hassle factor. Part of what goes into figuring out the hassle factor is how you identified the article in the first place and what network you're on.

At MPOW if you use Google Scholar or PubMed and you're on our network, you should be able to go right to the full text for the majority of things you're looking for because we have a lot of subscriptions. We have our IPs registered with Google so it points to our subscriptions and our link resolver. If you use our link resolver, it fills out the ILL form for you from there. Still, it is more convenient to get a pdf from/through Google than wait for ILL or for us to scan and e-mail you something.

What if you're off campus? A quick check of #icanhazpdf showed some people were asking because they were off campus. That, to me, seems like the height of laziness and inconsideration. Does their campus really have no remote access? The person who is sending it to you has to go through more effort than it would take you to VPN or use EZProxy.

One commentator heard from someone who does have access at work but couldn't be assed to use the library tools to locate it. Really? So the search on Sci-Hub doesn't work (I'm told) so the best way to use it is through the doi. I can put the doi of an article into my FindIt tool and get a proxied link to the best source for full text immediately - even if it's at a 3rd party aggregator. Legally. I can also put the PMID in. In fact, I have a plugin in my browser that automatically links the DOI to my link resolver.

Ok, so you may not be at an organization that has all this set up. There are lots of industrial and government scientists who have very little access to the literature. Even if they do have access, they might not have the connecting tools.

In many places ILL is awful. Let's be quite honest. Another form. Asks a lot of information. May have a different login. May take 2-3 weeks to arrive. It may be fax quality. May be a cost associated. In one sociology class I was in as a student they were going off on how bad it was: wrong article, missed several pages, illegible copies... the one guy put his request in like 5 times before getting a full, readable copy. He kept putting it in after a while to see how many tries it would take! Your buddies on Twitter do not have to print, scan to fax quality, and then send.

I love how people say you can use your local publib. Mine is not going to ill for scholarly articles for you. They don't have that kind of budget or staff. I think it's getting harder to use walk up access, too. If you have eduroam you can get on the network but if you're at a local small business? It's not like when the journals were in print.

I don't even know where I was going with this but to say that #icanhaspdf has a point. Library systems need to get easier and get in the workflow, but also scholars might actually need to put some effort in to learn to do things the right way.

4 responses so far

Defense slides

Took me a bit - I forgot to upload them to SlideShare until just now. I did pass with revisions to be approved by my advisor.

I have to tell you that it was really anticlimactic. I thought it would be a big weight off my shoulders and I would feel free and I would have minor quibbles but lots of pats on the back... but... well... I don't know.  This massive framework o' mine? The communications prof thought it was exactly the same as Shannon and Weaver (1948). Wow.

At least when I do these edits I can get on with writing up other work I've done and then prepping pieces of this for publication. So, really, no less work, but different.

I do fully intend to make this freely available with creative commons attribution and all that. The whole dissertation. I am going to do the revisions first, though, because some are pretty big.

2 responses so far

ACS and Just Accepted Manuscripts

A colleague posted on Chminf-l asking about the American Chemical Society's Just Accepted Manuscripts program. Most of the immediate responses were to explain the program, which is not what she asked. Here's the site's description:

"Just Accepted" manuscripts are peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society is posting just accepted, unredacted manuscripts as a service to the research community in order to expedite the dissemination of scientific information as soon as possible after acceptance. "Just Accepted" manuscripts appear in full as PDF documents accompanied by an HTML abstract. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). The manuscripts posted on the "Just Accepted" Web site are not the final scientific version of record; the ASAP (As Soon As Publishable) article (which has been technically edited and formatted) represents the final scientific article of record. The "Just Accepted" manuscript is removed from the Web site upon publication of the ASAP article, and the ASAP article has the same DOI as the "Just Accepted" manuscript. The DOI remains constant to ensure that citations to "Just Accepted" manuscripts link to the final scientific article of record when it becomes available.

The FAQ explains that this is opt-in and these copies will be removed when the ASAP and final versions are live.

Chemistry is kind of a funny field when you talk about scholarly communication and sharing (see and read everything from Theresa Velden's dissertation research on this, in particular). Journals are dominated by ACS with RSC and the other scholarly publishers following. In some areas like synthetic chemistry, there's a real reluctance to even share at meetings, no desire to post pre-prints, and tight control over data access. In more computational and analytic areas, it's a little more relaxed.

Pre-print server efforts in chemistry have been mostly unsuccessful. For one thing, the journals will not take articles posted elsewhere first. Second, there's this big tension with priority (now moving to first to file maybe will change patent things but there's still recognition issues).

With all that, there are still efforts to require self-archiving broadly across fields and to have disciplinary pre-print servers. The big publishers who are rolling in dough from the subscriptions from all the ACS accredited programs do not want to see these archives and self-archiving succeed, even though it's been shown that it doesn't harm subscriptions in physics.

Anyway, as I said on the list, this is a pretty smart move by ACS. It solves the problem of getting the science out there sooner, but still with peer review, and on the hosted platform. This version disappears and the doi points you to the official version when available so they keep the traffic in house. I'm sure the embargoes go from official publication, too, so this is more time the publisher has to disseminate the content and get attention before government funders and institutional repositories can share it.

I think it will be accepted by chemists because it is from ACS and it is after peer review. We'll see, though, if there are any typos and whatnot that offend people.

 

Edit to add: Thurston Miller points to a few viewpoint papers in Journal of Physical Chemistry Letters on OA (the papers themselves are not OA).

2 responses so far

Another dissertation on science blogs

Any readers interested in my work (and you'd probably have to be following me for a while to even know what that is), will probably be interested in that of Paige Brown Jarreau. She's a PhD Candidate at LSU and is defending any day now. She did a massive set of interviews and a survey and has shared some of her results on FigShare, on her blog, and in her Twitter stream. So far we've mostly had a glimpse of her findings - can't wait to see the rest of her dissertation (good grief the rate I'm going I guess I'll get a chance to cite it in mine 🙂 )

Comments are off for this post

Post I wish I had time to write: Scientific meetings and motherhood

Feb 24 2015 Published by under Conferences, scholarly communication

I was reading Potnia's new post on meetings - why to go to them - and nodding my head vigorously (ouch) and connecting that to the part of the dissertation I'm writing now on tweeting meetings and the research over the years on how scientific meetings work and contribute...

and I got very sad. I'm a real extrovert and a magpie of all sorts of different kinds of research, but I can't justify spending my limited time reading articles that aren't pretty directly relevant to my job or my dissertation. When I went to bunches of meetings, I could soak a million little tidbits up, meet the people doing the work, browse lots of posters and talk to their authors. It's really a very efficient way to see what's up with a field.

and now... I haven't been to a conference since I was in my first trimester with my twins 🙁   Sure, I've listened in to some webinars and followed some tweets. It's not enough.

Would childcare at a venue help?  I don't know... I'd still have to get them there, I'd have to trust the childcare (what if I got there and checked them out and didn't like what I saw?), and I'm paying for childcare at home even when I go and money is super tight now with my income being the only one in our household for more than a year.  I thought about bringing my sister along and then we could see the sights together outside of hours. My work would pay my travel and my room and so I'd just have to pay her travel and everyone's food. But I can't really even swing that right now....

 

So yeah... at least there's twitter. The post I'd like to write actually cites references and what not.

And I'm only the 10 millionth person to have this issue this year so I  know I'm not a special snowflake but that doesn't mean I can't still bitch about it.

4 responses so far

Enough already with the computer-generated papers!

Feb 25 2014 Published by under publishing, scholarly communication

SIGH! Years ago there was the Sokal affair that poked fun at cultural studies. Then there was a series of efforts to create a computer program to create articles - SciGen from MIT students is a famous one. Phil Davis got a computer-generated paper accepted to Bentham. More recently there was the Bohannon AAAS "sting" operation that (unfairly) targeted only OA journals... There were also two groups that gamed Google Scholar to show more citations... And now:

Publishers withdraw more than 120 gibberish papers
Conference proceedings removed from subscription databases after scientist reveals that they were computer-generated.

Richard Van Noorden, Nature News, 24 February 2014
http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763

Ugh! At least we can't blame Cyril Labbé, the scientist in question. He didn't submit the articles, he just detected them. And in places near and dear to my heart like IEEE Xplore and Springer. These are conference papers this time. Not only did they supposedly go through peer review - but were they presented? WHAT was presented? Even if these were pranks - how funny was it if it wasn't revealed? Should the authors be banned? Should they be charged with fraud as some suggest? What a stinking mess.

One response so far

Heads Up new Science is a Special Issue on Scholarly Communication

Oct 03 2013 Published by under publishing, scholarly communication

The "sting" article that details a Sokal-Affair-type test of crap open access publishers to see if they really were crap open access publishers is getting all the attention. (do note that Hindawi and PlosONE quickly rejected the manuscript and Plos even questioned the ethical issues - hence they are not crap publishers but decent publishers).

Elsewhere in the issue that just went live at 2pm are articles on:

2 responses so far

The #agu12 and #agu2012 Twitter archive

I showed a graph of the agu10 archive here, and more recently the agu11/2011 archive here, and now for the agu12/2012 archive. See the 2011 post for the exact methods used to get the data and to clean it.

#agu12 and #agu2012 largest component, nodes sized by degree

#agu12 and #agu2012 largest component, nodes sized by degree

agu12 and 2012 other components no iso sized by degree n1294

#agu12 and #agu2012 other components, no isolates, nodes sized by degree

I will have to review methods to show this, but from appearances, the networks are becoming more like hairballs. In the first year, half the people were connected to theAGU and the other half were connected to NASA, but very few were connected to both. The other prominent nodes were pretty much all institutional accounts. In 2011, that started to decrease and now in 2012 you can't really see that division at all. There are the top three nodes - two the same plus a NASA robotic mission - but then there's a large second group with degrees (connections to others) around 40-80 (combined indegree and outdegree) of individual scientists.

2 responses so far

Older posts »