## ACS and Just Accepted Manuscripts

A colleague posted on Chminf-l asking about the American Chemical Society's Just Accepted Manuscripts program. Most of the immediate responses were to explain the program, which is not what she asked. Here's the site's description:

"Just Accepted" manuscripts are peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society is posting just accepted, unredacted manuscripts as a service to the research community in order to expedite the dissemination of scientific information as soon as possible after acceptance. "Just Accepted" manuscripts appear in full as PDF documents accompanied by an HTML abstract. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). The manuscripts posted on the "Just Accepted" Web site are not the final scientific version of record; the ASAP (As Soon As Publishable) article (which has been technically edited and formatted) represents the final scientific article of record. The "Just Accepted" manuscript is removed from the Web site upon publication of the ASAP article, and the ASAP article has the same DOI as the "Just Accepted" manuscript. The DOI remains constant to ensure that citations to "Just Accepted" manuscripts link to the final scientific article of record when it becomes available.

The FAQ explains that this is opt-in and these copies will be removed when the ASAP and final versions are live.

Chemistry is kind of a funny field when you talk about scholarly communication and sharing (see and read everything from Theresa Velden's dissertation research on this, in particular). Journals are dominated by ACS with RSC and the other scholarly publishers following. In some areas like synthetic chemistry, there's a real reluctance to even share at meetings, no desire to post pre-prints, and tight control over data access. In more computational and analytic areas, it's a little more relaxed.

Pre-print server efforts in chemistry have been mostly unsuccessful. For one thing, the journals will not take articles posted elsewhere first. Second, there's this big tension with priority (now moving to first to file maybe will change patent things but there's still recognition issues).

With all that, there are still efforts to require self-archiving broadly across fields and to have disciplinary pre-print servers. The big publishers who are rolling in dough from the subscriptions from all the ACS accredited programs do not want to see these archives and self-archiving succeed, even though it's been shown that it doesn't harm subscriptions in physics.

Anyway, as I said on the list, this is a pretty smart move by ACS. It solves the problem of getting the science out there sooner, but still with peer review, and on the hosted platform. This version disappears and the doi points you to the official version when available so they keep the traffic in house. I'm sure the embargoes go from official publication, too, so this is more time the publisher has to disseminate the content and get attention before government funders and institutional repositories can share it.

I think it will be accepted by chemists because it is from ACS and it is after peer review. We'll see, though, if there are any typos and whatnot that offend people.

Edit to add: Thurston Miller points to a few viewpoint papers in Journal of Physical Chemistry Letters on OA (the papers themselves are not OA).

## OAuth in TwitteR much easier now, whew!

Mar 18 2015 Published by under Uncategorized

Not like I should be messing with this at this point, but I wanted to retrieve a tweet to provide evidence for a point. Anyway, instead of the like 50 step process in the past, you now follow the instructions in the TwitterR readme: http://cran.r-project.org/web/packages/twitteR/README.html with the exception of you now put your access token and secret in the *single command* now, too, like so:

setup_twitter_oauth(consumer_key, consumer_secret, access_token=NULL, access_secret=NULL)


Then you can just search or whatever. Wow!

Very nice. How much time did I spend playing with the old method?

## Polar and Ellipsoid Graphs in iGraph in R

Mar 12 2015 Published by under Uncategorized

I'm still working to do some additional graphs for the project mentioned in this earlier post. It was too crowded with the Fruchterman Reingold layout, so my customer suggested we do a circular layout with one category in the center and the remaining on the outer ring. I said sure! But when I went to do it, I found only star layout (one in the center) and ring layout. No polar layout. I tried a few things but finally broke down and asked. Quick perfect answer on StackOverflow (as often happens).

That led to this:

But hey, still pretty jammed up. So what about an ellipse? Sure!

What's that equation again?

$\frac {x^2}{a^2} + \frac {y^2}{b^2} =1$

But that's a hard way to do it when I need x and y values in a matrix. This looks better:

$x = a \cos(\theta) , y=b \sin(\theta)$

And this is how I did it.

ellip.layout <- function(a,b, theta) { cbind(a*cos(theta), -b*sin(theta)) }

systems <- which(V(g)$category == "System") comp <- which(V(g)$category != "System")

a<- ifelse(V(g)$category == "System",4,5) b<- ifelse(V(g)$category == "System",0.5,1)

theta <- rep.int(0, vcount(g)) #creates a blank vector
theta[systems] <- (1:length(systems)-1) * 2 * pi / length(systems)
theta[comp] <- (1:length(comp)-1) * 2 * pi / length(comp)

layout<- ellip.layout(a,b,theta)

plot.igraph(g, layout=layout, asp=0)

Originally I was getting the outer ring to be a circle anyway, but then I asked the mailing list and it was a matter of setting asp (aspect ratio) to 0.

Here's where I ended up:

ETA: If you do labels, there's a neat trick to make them always outside the circle. See here: https://gist.github.com/kjhealy/834774

## Ebook Explosion

Dec 17 2014 Published by under Information Science, libraries, Uncategorized

Seems like all the publishers and all the societies are trying to get into the eBook game. The newest announcement is from AAS (using IOP as a publisher). Considering the fact that a lot of these domains are not particularly known for monographs - like Computer Science and ACM's new ebook line - but instead for conference proceedings and journal articles, seems kinda weird.

Someone mentioned that maybe it was due to the ebook aggregator demand driven acquisition plans - but I think it's just the opposite. Many major publishers have jacked up prices (pdf) on EBL and Ebrary recently - all to push libraries in to licensing "big deal" bundles of the entire front list or entire subject categories. And it is super attractive to buy from the publishers because they're often without DRM, PDFs (one big publisher even offers a whole book in a single pdf, most are one pdf per chapter), ways to view online, easily findable using Google and also nice MARC records for adding to the catalog.

The ebook aggregators have nasty DRM. They have concurrent user rules. They have special rules for things that are considered textbooks.  We have to login with our enterprise login (which isn't my lab's day-to-day login) and the data about what books we view is tied to our identities. The new prices end up being as much as 30-40% of the cover price for a 1 day loan. That's right. The customer can look and maybe print a couple of pages for 24 hours and the library is charged a third the cover price of the book.

But for the society and publisher own pages, what seems like a one time purchase has now become yet another subscription. If you buy the 2014 front list will you not feel the pressure to buy the 2015 and 2016 publications?

Aggregators had seemed like some of the answer, but not so much with these prices. We've already mugged all the other budgets for our journal habit so where do these new things come from? The print budget was gone ages ago. Reference budget was also raided.  The ones we've licensed do get used a lot at MPOW.

## What I want/need in a programming class

Aug 08 2014 Published by under Off Topic, Uncategorized

Abigail Goben (Hedgehog Librarian) has a recent blog post discussing some of the shortcomings she's identified in the various coding courses she's taken online and the self-study she has done.

I think my view overlaps hers but is not the same. Instead of try to compare and contrast, I'll say what I've seen and what I need.

I'm probably pretty typical of my age: I had BASIC programming in elementary and high school. This was literally BASIC and was like

10 print "hello"
20 goto 10

I think we did something with graphics in high school, but it was more BASIC.  In college, they felt very strongly that physics majors should learn code, so I took the Pascal for non-CS majors in my freshman year.  That was almost like the BASIC programming: no functions, no objects... kinda do this, do this, do this... turn it in. I never did see any connection whatsoever with my coursework in physics. I never understood why I would use that instead of the Mathematica we had to use in diffeq

In the workforce, I did some self study javascript (before it was cool), html, CSS - not programming, obviously. And then I needed to get data for an independent study I was doing and my mentor for that study wrote a little Perl script to get web pages and pull out links. The script she wrote broke with any modifications to the website template, so after waiting for her to fix for me, I ended up fixing it myself... which I should have done to start with. ... In the second stats class another student and I asked if we could use R instead of Stata. He was going back to a country with less research funding and I was going to work independently. But then, we just used the regression functions already written out and followed from a book. Elsewhere in the workforce I've read a lot about R and some co-workers and I worked through a book... I did the CodeAcademy class on Python.

All of these classes - if they weren't in interactive mode, they could have been. What are the various data types. How do you get data in there and back out again. How do you do a for loop. Nobody really goes into any depth about lists in R and they pop up all over the place. I couldn't even get Python installed on my computer at first by myself because everyone teaching me was on a Mac. (btw, use active python and active perl if you're on Windows - not affiliated, but they just work).

The R class on Coursera (same one she complains about) and the data science class by JH there were the first that even really made me do functions. What a difference. I really appreciated them for that.

So here's what I think:

People new to programming - truly new - need to understand the basics of how any program works including data types, getting data in and out, for loops. But also architectural things like functions and objects. They probably need to spend some time with pseudocode just getting through the practice.

Then if you're not new to programming, but you're new to a language - different course. In that course you say this is how this language varies, this is what it does well with, here's where it fails.

Then there needs to be an all about software design or engineering or process course that talks about version control and how to use it. How to adequately document your code. How to write programs in a computationally efficient way. The difference between doing things in memory or not.  What are integrated development environments and when would you use one. This is what I need right now.

If it's something basic, I can follow along a recipe I can read off of stack overflow, but I know nothing about efficiency. Like why use sapply vs. a for loop? Is there a better way to load the data in? Why is it slow? Is it slower than I should expect? I love RStudio - love, love, love! But I tried something like that for Python and could never get it to work. I'm still learning git, but I don't really understand the process of it even though I can go through the steps.

Anyhow, more about me, but I think I'm probably pretty typical. I think there's a huge gap in the middle in what's being taught and I also think that a lot of people need the very basics of programming almost minus the specific language.

## I'm a coding fool... use of the nuggets mentioned last post

Jul 18 2014 Published by under information analysis, Uncategorized

Self-efficacy FTW... may I never crash!

Last post I mentioned determining if a string is an element of a character vector and opening Excel files in R. Here's what I was doing and how it worked.

I had a directory of xls files downloaded from a research database, one per organization, showing the high level subject categories in which they published. The subject categories were actually narrower than what I needed (think like condensed matter, high energy, amo, chemical physics and I need "physics"). I needed to rank the organizations by articles published in these composite categories that included maybe 1o or so of the categories from the database.

Originally, I was going to open all the files and just count them up, but whoa, that would suck. R for the win!

First, run RStudio with either 32- or 64- bit depending on which Java you have installed.

Next, get the list of files. I had saved them in a directory with other things, too, so I needed to search by pattern. I had already set my working directory to the data directory, for good or bad.

fileList<-list.files(pattern = "subject")

Get the list of their categories for my composite category (physics here). This was just a list:

physics <- read.delim("physics.txt", header=F)

Make sure words are characters and not factors, numbers are just numbers. By trial and error and inspection I figured out that there was a non-numeric character in the count field.

Here's the function:
countPhysics<- function(file){
#this creates a vector to hang on to the numbers of counts
phys.hold <- vector(mode="numeric")
#this is so i can make sure i just have numbers in the count field
pattern<-"[[:digit:]]+"
#this finds matching records and then keeps just the part we want
m<-regexpr(pattern,physfile$Count) physfile$Count<-as.numeric(regmatches(physfile$Count,m)) #one of these days i'll just import right the first time instead of this part physfile$Analysis.Value<-as.character(physfile$Analysis.Value) for (j in 1:length(physfile$Count)){
if (is.element(physfile$Analysis.Value[[j]],physics)) { phys.hold[j]<- (physfile$Count[[j]])}
else phys.hold[j]<-0
}
total<- sum(phys.hold)
return(c(file,total))
rm(phys.hold)
}


So you run this like so:

physicsResult <-sapply(fileList,countPhysics)

I transposed it and then pasted it into an excel file I was working on but this is essentially the answer. I did the same thing for the other categories separately, but obviously I should have checked each line for matching each/all of my categories before moving to the next line and then outputted a frame. Oh well.

Comments are off for this post

## Beginning adventures in Python

Jul 03 2013 Published by under information analysis, Uncategorized

I had a little slow period a month or so ago at work (not anymore, for sure!) and I decided it was time to start working on a goal I had set for myself for the year: learn to do some analysis that actually uses the full text of the document vs. just the metadata. Elsewhere I have discussed using Sci2, VantagePoint, bibliometrics, and actually Carrot2 (using the text of the abstract), but I need to go further. I don't aspire to become an expert in natural language processing (NLP) but there are some times I end up having to stop before I want to because I just don't know how to go on.

Anyhoo... first step was to see what I could do in R using the TM package and whatever else. I figured out how to do a word cloud but meh on some of the other tm stuff. I tried a little LDA but my corpus didn't work well with that. When doing the word cloud I realized I really wanted to lemmatize instead of stem. I looked around for ways to do it in R, and there is a WordNet package for R (thanks Greg Laden for pointing it out!) but it just wasn't doing it for me. I had recently worked my way through a bunch of the Python lessons on Code Academy and also bookmarked nltk - the natural language toolkit that works in python so I thought - ah-ha!

The first big deal was installing the stupid thing - language. Argh. I started with Eclipse and PyDev but alas, I am so not willing to figure out how that really works. I got one sample program running but next program it kept running the first one so meh.

I started working my way through the nltk book, and that used the shell mode I guess? where you get immediate responses? Installing packages - I never did figure out how to do that in Perl - it's easy in R but alas... so I gave up on PyDev and installed ActivePython which has a handy-dandy package installer which lo and behold works for people like me who only know enough to be dangerous.

The other things I'm learning: holy cow ignore what your computer is and do everything 32 bit for the love of chocolate. A bunch of problems from installing 64 bit where everything is looking for 32 bit. Uninstall and try again.

I still haven't figured out how to use the programming environment (?) that ships with ActivePython. I really like how RStudio completes things and that's why I wanted to use Eclipse. I'll have to try that next.

Anyway, I hope to take some notes and leave them here for my future recall as it's easy to forget how things worked.

Comments are off for this post

## Knowing what you know, or rather, what you've written

When I first came to work where I work now, I asked around for the listing of recent publications so I could familiarize myself with what types of work we do. No such listing existed even though all publications are reviewed for public release and all copyright transfer agreements are *supposed* to be signed by our legal office. Long story short, I developed such a listing and I populated it by having alerts on the various research databases.

Now, 9 years later, it's still running and it is even used to populate an external publications area on our expertise search app.

By its nature and how it's populated, there's absolutely no way it could be totally comprehensive and it is also time-delayed. It's probably a little better now with how fast the databases have gotten and because Inspec and Compendex now index all author affiliations and not just the first author.

Anyway, our leadership is on an innovation kick and looking at metrics to see how we compare to our peers and also if any interventions have positive effects. The obvious thing to look at is patents, but that's complicated because policies toward patenting changed dramatically over the years. They're looking now at number of publications - something I think they probably ought to note as part of being in the Sci/Tech business. My listing has been looked at, but that only started in 2003/2004. From here forward the public release database can be used... but what about older stuff? Well, in the old days the library (and the director's office) kept reprint copies of everything published. Awesome. Well, except they're kinda just bound volumes of all sorts of sizes and shapes of articles. I guess these got scanned somehow and counted, but they ended up with a few articles with no dates or citations (title and author but not venue). Three of these got passed to me to locate. They're not in the above mentioned research databases, but we know they were published (as re-prints were provided) and not in technical reports.

The answer? Google. Of course. The first was a book chapter that was cited in a special issue of a journal dedicated to the co-author. The second was a conference paper that appeared on the second author's CV (originally written in 1972 - thank goodness for old professors with electronic CVs!). The third was a conference paper cited by a book chapter indexed by Google Books. BUT to find the year, I have to request the book from the medical library... which I have done.

At least back in the day the leadership understood the value of keeping a database (print volumes) of our work. From at least 2003 until 2012, there was no such recognition. Now that I will be benchmarking us with peer organizations, I wonder if they're in the same boat or if they've keep their house in order with respect to their intellectual contributions?

Comments are off for this post

## Mobile device pairing. Kind of cool and way overdue.

May 15 2012 Published by under Uncategorized

I often get the question: isn't there something I can do to identify my work laptop so that I can go home and the journals, etc., will still recognize me without having to use the proxy or vpn?

Seemed kind of far-fetched. A publisher who was willing to that would be... gasp... giving up some control!

In a recent announcement, the American Mathematical Society informed us that their users are able to do just that.

They're not the first or only. You can roam with EndNote Web for a year. I think there is something similar with some of the Elsevier apps (maybe just scopus?). The ArtStor app used to do this (they might still... not sure).Maybe EngNetBase (but that was really clunky when I tried it).

This is nice - takes down some barriers for the users, increases usage, and still links downloads/reads to institutional subscriptions.

## When your web services change terms

Dec 23 2011 Published by under Uncategorized

Jonathan Rochkind has written a ton about web services and APIs that libraries can/should/do use. His posts are written from the point of view of someone who understands the programming bit, the data bit, and the library bit. This post is written by someone who watches that stuff with interest and has worked, on occasion, with programmers.

I mentioned some time ago that we got an internal (to my place of work) "ignition grant" to build a system for supporting the listing, searching, lending of personal or desk copies of books. It should be noted that the money was from lab leadership, but we were voted in by lab staff. We have an internal social networking tool that's running on Elgg so we decided to build it to hang off of that. My collaboration partners are from 2 sponsor-facing departments and work in information assurance type CS jobs, not as software developers. My contribution was really in how to track books and how people search for books, and lending books... oh and barcode scanners

So anyway, after a lot of discussion, we went with the Amazon API to provide book metadata including descriptions and book cover images. Unfortunately, Amazon changed their terms of service in November to require an associates ID. We ran this past various parties at the lab, including legal. No go. We couldn't sign up for an associates id because of other things in the license. So our beautiful system couldn't add any new books! And our grant was long over.

Luckily, some folks in the IT department stepped up to make a fix, but the problem is, what API to use?  I used Jonathan's posts and some other things around the web and came up with WorldCat and Open Library for cover images. So we're now back up and running but with no book descriptions.

Assuming we get the go ahead from legal, we hope to make our Elgg add-on open source and make it available from the Elgg site. If/when we do, we'll probably have screen shots to share and more information. It's a neat idea on another way to find expertise and to support collaboration (and saving money) within an organization.

The moral of the story is, watch out for the terms of service on apis, and keep watching because they can change and then your functioning service can go up in smoke. We feel a lot better about open library and somewhat better about worldcat ... but vigilance is important.

Comments are off for this post

Older posts »

• Scientopia Blogs

• I'm a science and engineering librarian in a university-affiliated research center and a doctoral candidate at University of Maryland. Nothing here represents either place.

• Get updates by email

• ## Contribute to Scientopia

Click here to toss some change in our contribution jar. All funds donated will be used solely to support Scientopia's operating costs. See our Code for more information on our funding policies.
• Where am I?

N 39 W 76

Research Blogging Awards 2010 Finalist