## Providing real, useful intellectual access to reference materials from current library pages

Those of use who study or teach about scientific information have this model of how it goes:

(this image is lifted from my dissertation, fwiw, and it more or less reproduces Garvey & Griffith, 1967; Garvey & Griffith, 1972)

Conference papers (in many fields) are supposed to be more cutting edge - really understandable to people in the field with a deep understanding but who need that icing on the cake of what's new. Journal articles are for more or less after substantive parts of the work are complete and take a while for review and publication (letters journals are supposed to be much faster), and then monographs and textbooks are more for when the information is more stable. More recently, there's a category of shorter books that are sort of like extended reviews but are faster than monographs. Morgan & Claypool, Foundations and Trends, and the new series coming from Cambridge University Press (no endorsement here) are examples. (Note the model omits things like protocols, videos, and datasets).

Reference books are even slower moving. They are used to look up fairly stable information. Here are some examples:

• encyclopedias (and not just Worldbook, but Kirk-Othmer, Ullman's, and technical encylopedias)
• dictionaries
• handbooks (not just for engineers!)
• directories
• gazetteers (well, maybe less so for the sciences), maps
• guidebooks (like in geology, biology)
• sometimes things like catalogs...

You may think, hey, all I really need are the journal articles and Google and maybe Wikipedia. Or at least publishers and librarians think you're thinking that. And reference books are sort of disappearing. It doesn't make any sense to devote precious real estate to the print versions and the online versions are super expensive and also often not used.

The thing is that these tools are really still needed and they have condensed very useful information down into small(er) packages. If you're concerned about efficiency and authority then starting with a reference book is probably a good idea if you want an overview or to look up a detail.

The publishers don't want to lose our money so they're taking a few different approaches. Some are making large topical digital libraries that combine journal articles, book chapters, and reference materials. This can be really good - you can look up information on a topic when you're reading a journal article or look up a definition, etc. You can start with an overview from an encyclopedia and then dive deeper to learn what's new. The problem from a librarian and user point of view is that the best information may come from multiple different publishers and you just won't get that. You won't get a recommendation for someone else's product.

Another thing publishers are doing is to make reference materials more dynamic. First, they can charge you more and and more frequently. Second, even if the updates are quite small, it makes the resource more attractive to potential users to have a recent date updated. One publisher in particular has commissioned sort of a portal approach that gathers materials from various places and has commissioned new overviews.

There's a tool to sort of search across more traditional reference materials, but... meh.

Of course if you have a well-developed model of what type of reference tool will have your needed information, then you can use the catalog (subjects like engineering - handbooks, engineering - encyclopedias). Back in the day, I wrote about how senior engineers gathered and created their own handbooks from pieces they'd found useful over time.

So here's where librarians come in. I've never taught the basic undergrad science welcome-to-the-library class (I attended one <cough> years ago), so I really don't know if they go over these distinctions or not. So that leaves our guides to try to get people to the best source of information. Guides that are merely laundry lists of tools by format/type are frowned upon because they are generally not useful. That's what we used to do though: here's a list of dictionaries, here's a list of encyclopedias... etc. What we try to do more now is make them problem based. Somewhat easier in like business: need to understand an industry? need to look up a company? Also maybe in materials science and or chemistry (although SciFinder and Reaxys' way of doing properties may be supplanting).

Ok, so beyond the difficulty of expressing the value of each of these tools and in which situations they are useful, we have the affordances of our websites and the tools that produce them. Most are database driven now, which makes sense because you don't want to have to go a million places to update a url. Except... one reference might be useful for one purpose in one guide, and another in another, and then how do you get that to display? How do you balance chatty to educate when needed verses quick links for when not?

Also, do you list a digital library collection of handbooks or, more commonly, monographs mixed with handbooks, as a database? As what?

The reviews and overviews and encyclopedias... do you call them out separately? By series?

Users sometimes happen upon reference books from web searches - but that's mostly things like encyclopedias. If they need an equation or a property... well, if they're an engineer they probably know exactly what handbook... so then, I guess, if they don't have their own copy, they would use the catalog and get to the current edition which we may have online. Getting a phase diagram or other property of a material - I'm guessing users would probably start online but for some materials we have entire references (like titanium, aluminum... and then things like hydrazine).

I'm thinking we could have on an engineering guide, a feed from the catalog with engineering - handbooks? Likewise a feed physics-handbooks?  What about things like encyclopedia of optics. Call out "major reference works" and then catalog feed of [subject] - handbooks|encyclopedias|etc....

OR.. hey... what about the shelf display model:

But, instead of all books, just the books for that guide that match [guide name] -- encyclopedia|dictionary|handbook, etc.

What other methods can we use?

## ASIST2017: Information Use Papers

Ma Cui-Chang and CaoShu-Jin - Identifying structural genre conventions across academic web document for information use

Swales model for research articles

Move 1 Establishing a territory
Move 2 Establishing a niche
Move 3 Occupying the niche

rhetorical organization patterns - disciplines, different information uses

sources for development: rhetorical objectives of the genres > linguistic clues > move analysis, writing rules genre research

academic blog post, online encyclopedia, research articles

corpus - 81 documents, 2015, Chinese documents with kw "citation analysis"

raters - interrater reliability 80-100%

Taxonomy identified and validated

q: how will you use this? will you use machine or automated clustering based on this.

q: can you elaborate on information units you found on the web or in web documents vs. formal publications.

a: main difference in how organized. also Swales is developed from written English articles.

q: mentioned Swales was developed to help train junior users, could your taxonomy help further with teaching

Devendra Dilip Potnis (speaker), Kanchan Deosthali, Janine Pino - Investigating barriers to using information in electronic resources: a study with e-book users

Motivation: spend money on electronic resources, but they're underused. Goal: to investicate barriers to using information in ebooks

Key findings - 60 barriers. Categories:

• features of ebooks (20)
• psychological (7), somatic(3), cognitive status (6)
• cost
• policies

different actors - things about the users and things about the environment, system, vendors

uses Wilson (2000)'s definition of using information - both physically accessing, as well as mental schemas and emotional responses

4 broad stages of information use- searching, managing, processing, applying information

Lots of previous studies - their main difference is how they look at use of information instead of "value".

They did a survey of LIS students (n=25) [sigh... this is a real and important topic, but sigh]

These participants also might have more insight into use of information, what's going on in libraries, etc.

Great quotes - flipping pages waiting for a page to load - breaks concentration. Not immersive. Policies don't let download. Poor text quality.

Mapped barriers to information use stages.  For example psychological barriers prevent information processing. Technical barriers prevent use of information

"due to a series of unavoidable barriers, respondents who originally intended to use ebooks for utilitarian purposes end up using this electronic resource mostly for hedonistic reasons " (pleasure reading, but not reference)

contributions - insight into adoption, why a negative perception. also if hiring a new librarian, will they have a negative attitude toward ebooks.

q: plans to go bigger with this

a: not really - so disheartening [welcome to my world] - but is planning a bigger hci study

q/comment: need to really differentiate between scholarly and leisure reading and even within scholarly, engaged with as monograph vs no drm pdf per chapter engaged on a per-chapter basis almost as a journal article

q/c: some have advanced annotation and highlighting features of which users may be unaware

Ayoung Yoon - Role of Communication in Data Reuse

Secondary use of data - not for the original purpose, and generally not by original collector of data

not a simple one-step process, transfer of knowledge, "social process" interactions and communications with other relevant parties (Martin, 2014)

who are involved, why and how)

past studies - transferring information about context of data, difficult to know what contextual information is important for unknown possible reusers, level of skills and tacit knowledge of reuser

strategies - documentation (inherently insufficient, not everything can be transferred), communication with producers (formal or informal)

38 - quantitative data reusers in social work and public health. Identified from scholarly databases using "secondary data" or "secondary analysis"

not a linear process - discovery, selecting, understanding, analyses, manuscripts

purpose of interaction communication - searching, interacting, problem solving

search is complicated - no one place to look, data may be dated, rely on established network, have a "data talk"

interaction/communication - learning process, collaboration and mentoring process, "not just access to the data but more importantly, access to people", "how to get around challenges"

problem solving - "knowing other people who were closely working with the data" "talking among ourselves" give reusers "confidence" about solving issues. Also working with data professionals and statisticians "if the problem was really me or the data"

Limitation of communication around data - have to be part of the network to have information needed to access data - peripheral and junior researchers. Unsuccessful interaction with data producers (no answers, partial answers, busy, contact person may be project manager and many not know)

communication is not always necessary for reusers - if it's well documented, known, and the reuser is experienced.

important to support this communication around data - most libraries do not deal with this but deal with mandates and sharing.

q: communication around data among reusers - not with producers - role for platform to support?

a: extended (great) - she did see that in her work. lots of discussion at conferences and within networks among reusers. OTOH, some participants hit a wall when they didn't get a response from producer and didn't have anyone else to ask next. Library is not seen in facilitating this but would be helpful if they could. Platform facilitating could be useful, too.

## Metaknowledge (python) and Bibliometrix (R) - more or less comprehensive bibliometrics packages for standard data science environments

I thought for sure I had mentioned Metaknowledge here before but I can't find it so I must have mis-remembered. ...

There are tons of tools for bibliometrics and a lot of people really just code their own for simplicity sake even if they eventually visualize their results using an off the shelf network analysis tool or other. Sci2,VOSviewer, and CiteSpace are all close to comprehensive, freely available, and pretty easy to use tools. What need is there for another product? If you want to use the rest of your workflow or experiment with new algorithms that are not available in the above, then these two packages are good options.

When I was doing the longitudinal clustering for citation trajectories, I inadvertently saved the 6,666 (I know!) records from 1980-2015 in the regular WoS* format instead of a more useful tab delimited.  I quite easily pulled out the pub year, accession number, times cited, and other simple fields using R. ... it's just now when I actually want to follow up with some natural language processing on the titles and abstracts that I realize my kludge won't actually work for either the title or abstract. So I fooled with it a couple of different ways before heading out to see if there was anything new out there for processing these since they were such a hassle to get in the first place. It turns out there is a new, fairly comprehensive R package: Bibliometrix. I had already experimented with Metaknowledge in Python. The extensive instructions (paywall) are very helpful, but I really just wanted to stay in R.

What follows is a general intro to these two tools and my observations.

## Bibliometrix

http://www.bibliometrix.org/

This package appears to be quite new with recent releases. The first thing I tried - reading in a directory full of WoS export files was like magic. In a snap, I had a dataframe with everything in the right column.

Literally:

filenames <- list.files("directory", full.names=TRUE)

getWoSdf<-function(filename){
recsdf<-isi2df(holdrecs)
return(recsdf)
}

WoSall<- ldply(filenames, getWoSdf)


Seems like BibTeX files are preferred over this format, but it was plenty quick for the 500 records per file I had. A nice feature is that it tells you every hundred records that it's making progress.

A nice thing is that there are pre-built basic summary/descriptive functions. It exports the standard networks but it also does co-word with a pretty neat visualization.

Multiple Correspondence Analysis (MCA) using keywords

This tool doesn't do anything to help you clean the data or pick out any weirdnesses. The visualizations shown aren't super pretty, but it's quite easy to use another R graphing tool with the data.

## MetaKnowledge

http://networkslab.org/metaknowledge/

I worked through the journal article but using my own WoS data. For WoS data, everything worked as expected and I was able to quickly get really nice results.  You can also download a Jupyter notebook with their sample data to work through the process. A neat thing you don't see every day is that it will break down by male/female by guessing using a popular algorithm.  It also does Reference Publication Year Spectroscopy (meh) and besides extracting all of the standard networks you might want, it also has ways to extract text for text mining.

Some negatives based on my brief experiments. I couldn't get Scopus data to work for whatever reason. Also, it doesn't really facilitate any sort of cleaning. Data sets that come out of WoS still have issues. The graph examples were not very pretty and some sort of error came out of the way they had you do one graphing tool. It's quite easy to export data or just substitute your favorite graph because there are a million.

* no endorsement intended, for real.

## Who do I want to rescue me?

DM has continued a meme - who do you want to rescue you?

These are not ranked, necessarily:

1. Dr. Who
2. Paw Patrol
3. Mark Watney
4. the guys in the Scott Lynch books (all both men and women, not just Locke)
5. Kvothe

## What's old is new again

Everybody's back starting up an online community for their publishing platform. IEEE with Collabratec. ACS with ChemWorx. Science has one, too.

Seems like everyone did this 15 years ago. The only difference now seems to be the addition of authoring tools. We'll see.

## How to get unbound by non-forward thinking users...

Last post I described a system that was stuck by its own commitment to user-driven development. They're really stuck. So what are possible ways out? Particularly for a government system?

I really don't know and particularly for a government system but that doesn't mean I can't think about it.

One thought was that maybe they need to make their case more clearly. How could they describe the projects better to make them more attractive in the rankings? This is probably impossible and maybe even insulting as they probably tried very hard to get their point across in the past. They seemed frustrated. Of course, they could hire a consultant to tell them exactly what they already knew - some people will listen to consultants.

I was wondering if acquisition rules would allow them to set aside like 20% or something to do their projects - ones that they thought were best but not necessarily voted on by the users. This would work for things that were less expensive to do or could be piloted.

Part of the problem is that the system may need to be re-architected and might need major redesign. Some of the pieces can be kept, but need to be integrated. That would have to wait for the next major version. Maybe if their key software underneath has to be upgraded, they could use that as a reason to do some things?

Sigh. I don't know. It sure is easier just to dream of a cool system.

## 2015 in Review

Well, 2015 was sort of a meh year for me. Definitely on the blog.

January: Using more of the possible dimensions in a network graph - I was glad I shared this and glad I was able to make it work in the first place.

February: So... um... what if I'm still enjoying it? - about my dissertation.

April: Which are the bestest? Top articles from a diverse organization - part 1 - never did part 2 AND still need to write this up for publication

July: none

August: Why special librarians should be active on their organization's intranet social media - the title of the post is not really descriptive. This is a research blogging post about the use of social media on a company's intranet.

November: Citation Manager Frustration - I actually had 3 posts on the same date, but this is the most important. I really don't like the way things are going with citation managers. As an update: the folks from RefWorks did contact me and I described a bunch of the issues. I think they'll have other ways to solve the same problems I'm encountering than what I proposed but they definitely seemed interested.

I'm shocked that I posted at least something every month but July.

## ACS and Just Accepted Manuscripts

A colleague posted on Chminf-l asking about the American Chemical Society's Just Accepted Manuscripts program. Most of the immediate responses were to explain the program, which is not what she asked. Here's the site's description:

"Just Accepted" manuscripts are peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society is posting just accepted, unredacted manuscripts as a service to the research community in order to expedite the dissemination of scientific information as soon as possible after acceptance. "Just Accepted" manuscripts appear in full as PDF documents accompanied by an HTML abstract. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). The manuscripts posted on the "Just Accepted" Web site are not the final scientific version of record; the ASAP (As Soon As Publishable) article (which has been technically edited and formatted) represents the final scientific article of record. The "Just Accepted" manuscript is removed from the Web site upon publication of the ASAP article, and the ASAP article has the same DOI as the "Just Accepted" manuscript. The DOI remains constant to ensure that citations to "Just Accepted" manuscripts link to the final scientific article of record when it becomes available.

The FAQ explains that this is opt-in and these copies will be removed when the ASAP and final versions are live.

Chemistry is kind of a funny field when you talk about scholarly communication and sharing (see and read everything from Theresa Velden's dissertation research on this, in particular). Journals are dominated by ACS with RSC and the other scholarly publishers following. In some areas like synthetic chemistry, there's a real reluctance to even share at meetings, no desire to post pre-prints, and tight control over data access. In more computational and analytic areas, it's a little more relaxed.

Pre-print server efforts in chemistry have been mostly unsuccessful. For one thing, the journals will not take articles posted elsewhere first. Second, there's this big tension with priority (now moving to first to file maybe will change patent things but there's still recognition issues).

With all that, there are still efforts to require self-archiving broadly across fields and to have disciplinary pre-print servers. The big publishers who are rolling in dough from the subscriptions from all the ACS accredited programs do not want to see these archives and self-archiving succeed, even though it's been shown that it doesn't harm subscriptions in physics.

Anyway, as I said on the list, this is a pretty smart move by ACS. It solves the problem of getting the science out there sooner, but still with peer review, and on the hosted platform. This version disappears and the doi points you to the official version when available so they keep the traffic in house. I'm sure the embargoes go from official publication, too, so this is more time the publisher has to disseminate the content and get attention before government funders and institutional repositories can share it.

I think it will be accepted by chemists because it is from ACS and it is after peer review. We'll see, though, if there are any typos and whatnot that offend people.

Edit to add: Thurston Miller points to a few viewpoint papers in Journal of Physical Chemistry Letters on OA (the papers themselves are not OA).

## OAuth in TwitteR much easier now, whew!

Not like I should be messing with this at this point, but I wanted to retrieve a tweet to provide evidence for a point. Anyway, instead of the like 50 step process in the past, you now follow the instructions in the TwitterR readme: http://cran.r-project.org/web/packages/twitteR/README.html with the exception of you now put your access token and secret in the *single command* now, too, like so:

setup_twitter_oauth(consumer_key, consumer_secret, access_token=NULL, access_secret=NULL)


Then you can just search or whatever. Wow!

Very nice. How much time did I spend playing with the old method?

## Polar and Ellipsoid Graphs in iGraph in R

I'm still working to do some additional graphs for the project mentioned in this earlier post. It was too crowded with the Fruchterman Reingold layout, so my customer suggested we do a circular layout with one category in the center and the remaining on the outer ring. I said sure! But when I went to do it, I found only star layout (one in the center) and ring layout. No polar layout. I tried a few things but finally broke down and asked. Quick perfect answer on StackOverflow (as often happens).

That led to this:

But hey, still pretty jammed up. So what about an ellipse? Sure!

What's that equation again?

$\frac {x^2}{a^2} + \frac {y^2}{b^2} =1$

But that's a hard way to do it when I need x and y values in a matrix. This looks better:

$x = a \cos(\theta) , y=b \sin(\theta)$

And this is how I did it.

ellip.layout <- function(a,b, theta) { cbind(a*cos(theta), -b*sin(theta)) }

systems <- which(V(g)$category == "System") comp <- which(V(g)$category != "System")

a<- ifelse(V(g)$category == "System",4,5) b<- ifelse(V(g)$category == "System",0.5,1)

theta <- rep.int(0, vcount(g)) #creates a blank vector
theta[systems] <- (1:length(systems)-1) * 2 * pi / length(systems)
theta[comp] <- (1:length(comp)-1) * 2 * pi / length(comp)

layout<- ellip.layout(a,b,theta)

plot.igraph(g, layout=layout, asp=0)

Originally I was getting the outer ring to be a circle anyway, but then I asked the mailing list and it was a matter of setting asp (aspect ratio) to 0.

Here's where I ended up:

ETA: If you do labels, there's a neat trick to make them always outside the circle. See here: https://gist.github.com/kjhealy/834774

