Focusing on counts erodes research libraries' competitiveness

(by Christina Pikas) Dec 05 2016

by @glennobleFor many years, research libraries (mainly those in academic institutions but also in other research centers) have been all about counting collections: how many volumes owned? how many journals licensed? Bigger is better. Millions of volumes.

This pressure, combined with continual downward budgetary pressure and the global doubling of scientific output every nine years, has led to most libraries taking some short cuts to get more coverage (more volume and more volumes). In place of carefully curated abstracting and indexing services necessarily specific to certain domains of knowledge that help explore and identify sources of information but do not provide physical access, many libraries are licensing massive collections from Eb and PQ that hugely boost the numbers. They are also licensing these massive "discovery" systems that, in my opinion, completely fail to improve discovery. We librarians have told our vendors that our most important users are the undergraduates who need any few articles on a topic to quickly pad their bibliography.  Vendor offerings that make that process easier are welcomed.  So we cancel Inspec, Biosis, GEOBASE and similar to feed the beast of more and more content. The vendors who provide access to formerly very useful databases (cough Aerospace cough) more or less eviscerate them to also give more - higher counts, faster, broader... and cheaper (no - lol - never cheaper for *libraries*)

Yet, as everyone has said before me, we are living in times of information abundance not scarcity. We know we cannot survive with the library-as-pocketbook model. Some of our value comes in working with users as partners in their research. We work to understand what their information problem entails and to help them (teach, do for, or provide tools for them to) find the information they need. We should also be building and licensing systems for the most sophisticated of users on our faculties and in our research centers. We should strive for precision and also serendipity of unexpected very relevant articles. We should save the time of the reader. What value millions of responses to a web query if your answer is on page 10? New researchers should be taught to be more sophisticated in their searching (I honestly think chemistry may be the only field that does this well), instead of accepting good enough or random iteration around the theme.

The best services and tools respect the researcher's precious time. They help the researcher have better information more quickly and with more context and confidence.  This is the way we compete with the ubiquity of information freely available on the internet. It's something we do and something we can do quite well... but we need to stop these collections processes now before it's too late.

 

*These are my opinions and do not necessarily reflect those of my immediate organization or my parent institution. Any specific products are mentioned to clarify my meaning. No endorsement should be inferred.

No responses yet

Getting article metadata from MS Academic: some R code

(by Christina Pikas) Nov 27 2016

As promised, I went back and did this myself instead of relying on a partner in crime (earlier referred to as an SME but he outed himself). It's funny because I had his code, but he did things differently than I do them so I needed to do it myself.

First mostly successful run I ended up with about 44% of the rows missing the metadata. I discovered fairly quickly that using TM's removePunctuation was, of course (in retrospect), closing up instead of leaving a space for intraword dashes. You can have it ignore those, but you can't have it go ahead and leave a space. I first did some finding and replacing in Excel but that got me down to 32%. Then I was like, duh, just do the gsub for [[:punct:]] and see if that's better. I hope I haven't used my quota!

Here's the code. Sign up for your key here. Also note: not affiliated, not endorsing.

#microsoft academic to try to find affiliations for article titles

library("httr", lib.loc="~/R/win-library/3.3")
library("tm", lib.loc="~/R/win-library/3.3")
library("jsonlite", lib.loc="~/R/win-library/3.3")

setwd("~/DataScienceResearchInitiative")

#don't forget the following or you will regret it
options(stringsAsFactors = FALSE)

# api info https://dev.projectoxford.ai/docs/services/56332331778daf02acc0a50b/operations/565d753be597ed16ac3ffc03

# https://api.projectoxford.ai/academic/v1.0/evaluate[?expr][&model][&count][&offset][&orderby][&attributes]

#key:
msakey1<-"put yours here"

apiurl<-"https://api.projectoxford.ai/academic/v1.0/evaluate?expr="
searchexpr<-"Ti='example'"
apiattrib<-"Ti,Y,AA.AuN,AA.AfN,C.CN,J.JN,E"

#test on one to see how it works
testcite <- GET(apiurl, 
         query = list(expr = searchexpr,count = 1, attributes = apiattrib), add_headers("Ocp-Apim-Subscription-Key"= msakey1))

#get the json out into usable format
#could look for errors first
testcite$status_code

#comes out raw so need to make into text
testciteContent <- rawToChar(testcite$content)

test<-fromJSON(testciteContent)
test$entities$AA
test$entities$AA[[1]]$AuN
#this will get a ; separated vector
paste(test$entities$AA[[1]]$AuN, collapse = ';')

test$entities$AA[[1]]$AfN
test$entities$J$JN
test$entities$Y
test$entities$Ti

# initiate a dataframe
# for each title, go out and search using that title
# could add in a warn_for_status(r)  when status is not 200 (happy)
# if status !200 go to the next one,  if status =200
# extract ti, y, authors (paste), affil (paste), jn, cn, and out of entities VFN, V, FP LP DOI D
# write them to the data frame
#1904 is the length of my article title list

CitesOut<- data.frame(ti = rep(NA,1904),
                      y = integer(1904),
                      au = rep(NA,1904),
                      af = rep(NA,1904),
                      jn = rep(NA,1904),
                      cn = rep(NA,1904),
                      vfn = rep(NA,1904),
                      v = rep(NA,1904),
                      fp = rep(NA,1904),
                      lp = rep(NA,1904),
                      doi = rep(NA,1904),
                      abs = rep(NA,1904),
                      stringsAsFactors = FALSE)
  
getMScites<- function(citeNo){
  apiurl<-"https://api.projectoxford.ai/academic/v1.0/evaluate?expr="
  searchexpr<- paste0("Ti='",TitlesToFindf[citeNo],"'")
  apiattrib<-"Ti,Y,AA.AuN,AA.AfN,C.CN,J.JN,E"
  holding<-GET(apiurl,
               query = list(expr = searchexpr,count = 1, attributes = apiattrib), 
               add_headers("Ocp-Apim-Subscription-Key"= msakey1))
  print(paste("cite number", citeNo,"status is:", holding$status_code))
  print(holding$headers$`content-length`)
  holdingContent <- rawToChar(holding$content)
  holdC<-fromJSON(holdingContent)
  cciterow<-data.frame(
    ti=ifelse(is.null(holdC$entities$Ti),NA,holdC$entities$Ti),
    y=ifelse(is.null(holdC$entities$Y),NA,as.integer(holdC$entities$Y)), 
    au=ifelse(is.null(holdC$entities$AA[[1]]$AuN),NA,paste(holdC$entities$AA[[1]]$AuN, collapse = ';')),
    af=ifelse(is.null(holdC$entities$AA[[1]]$AfN),NA,paste(holdC$entities$AA[[1]]$AfN, collapse = ';')),
    jn=ifelse(is.null(holdC$entities$J$JN),NA,holdC$entities$J$JN),
    cn=ifelse(is.null(holdC$entities$C$CN),NA,holdC$entities$C$CN))
  print(cciterow)
  if(is.null(holdC$entities$E)){
    eciterow<-data.frame(
      vfn=NA,
      v=NA,
      fp=NA,
      lp=NA,
      doi=NA,
      abs=NA)
  } else {
    holdE<-fromJSON(holdC$entities$E)
    eciterow<-data.frame(
      vfn=ifelse(is.null(holdE$VFN),NA,holdE$VFN),
      v=ifelse(is.null(holdE$V),NA,holdE$V),
    fp=ifelse(is.null(holdE$FP),NA,holdE$FP),
    lp=ifelse(is.null(holdE$LP),NA,holdE$LP),
    doi=ifelse(is.null(holdE$DOI),NA,holdE$DOI),
    abs=ifelse(is.null(holdE$D),NA,holdE$D)
    )
  }
  print(eciterow)
  citerow<-cbind(cciterow,eciterow, stringsAsFactors=FALSE)
  print("this is citerow")
  print(citerow)
  return(citerow)
} 

#troubleshooting
apiurl<-"https://api.projectoxford.ai/academic/v1.0/evaluate?expr="
searchexpr<- paste0("Ti='",TitlesToFindf[4],"'")
apiattrib<-"Ti,Y,AA.AuN,AA.AfN,C.CN,J.JN,E"
troubleshoot<-GET(apiurl,
               query = list(expr = searchexpr,count = 1, attributes = apiattrib), 
               add_headers("Ocp-Apim-Subscription-Key"= msakey1))

troubleshoot$status_code
troubleshoot$headers$`content-length`

troubleshootcontent<-rawToChar(troubleshoot$content)  
troubleC<-fromJSON(troubleshootcontent)
troubleE<-fromJSON(troubleC$entities$E)

#prepare title list
## IMPORTANT - all the titles have to be lower case and there can't be any punctuation
TitlesToFind <- read.delim("~/DataScienceResearchInitiative/TitlesToFind.csv", header=FALSE)

TitlesToFindl<-apply(TitlesToFind,1,tolower)

TitlesToFindf<- gsub("[[:punct:]]"," ",TitlesToFindl)

head(TitlesToFindf)

#use the sys.sleep so you don't get an error for too many requests too quickly
for (i in 21:1904){
  temp<-getMScites(i)
  CitesOut[i,]<-temp
  Sys.sleep(2)}
write.csv(CitesOut,"MSdsCites.csv")

length(which(is.na(CitesOut$ti)))
length(which(is.na(CitesOut$abs)))

missCites<-which(is.na(CitesOut$ti))

for (i in 1:length(missCites)) {
  temp<-getMScites(missCites[i])
  CitesOut[missCites[i],]<-temp
  Sys.sleep(2)
}

edited: to fix formatting. also the missing cites were writing to the wrong rows.. sigh.

No responses yet

Retrieving article metadata from Microsoft Academic Scholar

(by Christina Pikas) Nov 19 2016

In the ongoing saga of doing information analysis and bibliometrics of some sort in computer science... now I need affiliations. As a reminder, I did the first bit of this work in Inspec* because it has high quality metadata but then I discovered after reviewing results with SMEs that it totally was missing a bunch of important conferences in the field - most notably some big ones from ACM. So I searched DBLP using their API , ACM's Guide to the Computing Literature, ArXiv, and CiteSeer and found a bunch more interesting articles. I de-duplicated with the Inspec set and then did topic modelling using whatever I had (used abstract, title, and keywords). Well, ACM doesn't export abstracts and DBLP doesn't even have them.

And then I got all turned around after linking the article titles back to the topics, working with the SMEs to name and select the interesting topics.... so, oops... now I had a list of ~2000 titles alone and no other information but I had actually needed to give a list of top organizations and top venues for these interesting topics... Uh-oh.

Of course Google Scholar doesn't have an API. Aminer does, but a quick check had it returning 0 results for my first few titles through the web interface. CiteSeer, I don't even know. What to do?  Ah-ha Microsoft Academic Search* does have an API but it's not all that comprehensive yet... oh wait - it's actually IS quite good in computer science.

Ideally, there would already be an rOpenSci package to search it but the only package I found was for using some of the other Microsoft Cognitive Services APIs. The main Academic Knowledge site makes it very easy to sign up to make up to 10k requests a month for free. There's even a console you can use to test your queries separately from your code.
So what's the problem, you ask? Just iterate through searching for each title, pull down JSON for just the fields you need (C.CN, J.JN,AA.AfN), parse into a data frame, then tot them up.... Yet our searches were not getting any results... until we happened on a StackOverflow question ... You need to lower case and remove all punctuation prior to searching.

A SME at work ended up doing the actual coding for this part but I'm going to try to reproduce it on my own to make sure I have it. When I do, I will surely share.

Long story but: 1) it would be lovely to have a package to use for this API 2) MAG does work fine for this purpose for this topic 3) be sure to lower case and remove punctuation as a first step

*no endorsement intended

3 responses so far

Special Libraries, Information Fluency, & Post-Truth

(by Christina Pikas) Nov 19 2016

Lots of librarians are communicating online about providing resources and information in these fraught times. Some are creating displays and programs to support diverse populations. Others are crafting statements of support and/or opposition. More practically, some are stocking their reference desks with safety pins and extra scarves (to be used by women who have had their hijab snatched off).

But these activities are more useful, perhaps, in public or academic libraries.

In the past few days, "fake" news and propaganda has been receiving a lot of attention. (Hear a rundown from On The Media  - clearly a very liberal news source but transparent about it). As noted on this blog, it is not really possible to insist that our patrons/customers/users use only our licensed sources. To be quite honest, even if we could, that alone isn't enough to ensure they are getting the best information. We think that because our people all have at least college degrees that they are experts on or at least competent in critical thinking.

I think, though, that the media environment isn't what it was when many of them were in school. We take the click bait and we see headlines repeated so often on Facebook that maybe we start to believe?

So, now, how do special libraries train and support their organizations in the post-truth world? I have been asked and have accordingly scheduled training that discusses critically evaluating resources; however, that is NOT at all attractive to busy professionals. The only training I offer that is well-attended is problem oriented and is explicitly related to doing the scientific and technical work (no surprise here to my library school professors!). Otherwise, short on-point training at the point of need is also well-accepted.

Integrate aspects of critical thinking and evaluating resources into every bit of training you do. If your user base can qualify for a free web account for the Washington Post (.gov, .mil, & .edu), make that information available even if you provide access through another source.  Do show finding news information in other topical sessions. For example, a session on aerospace engineering could cover things like society news sources and Aviation Week.

If your organization has an internal newsletter and or social media site, link early and often to reputable sources.

Are you integrated into strategic processes (never as much as you would like, I know!)? What information is your leadership getting and from where? The very highest levels of your organization won't typically attend your classes - can you brief their assistants? Can you make this information available to their mobile devices?

No responses yet

Using bibliometrics to make sense of research proposals

(by Christina Pikas) Nov 01 2016

This was presented at the Bibliometrics & Research Assessment Symposium held at NIH on October 31, 2016.

No responses yet

DBLP > EndNote using R

(by Christina Pikas) Oct 17 2016

I'm doing a study in which I'm mapping the landscape for an area of Computer Science. I did the initial work in Inspec and once I found the best search (hint: use a classification code and then the term), I was pretty happy with the results.  When I showed it to my SMEs, however, they fairly quickly noticed I was missing some big name ACM conferences in the field. I've contacted Inspec about those being missing from the database, but in the mean time, oops!  What else is missing?

The more comprehensive databases in CS are things like ACM Guide to Computing Literature, CiteSeer, and DBLP.... ACM is very difficult to be precise with and you can either export all the references or one at a time... CiteSeer was giving me crazy results... DBLP had good results but once again, export one at a time.
So here's how to use DBLP's API through R and then get the results into EndNote (using X7 desktop)

#getting stuff faster from dblp
#https://www.r-bloggers.com/accessing-apis-from-r-and-a-little-r-programming/
options(stringsAsFactors = FALSE)
library("httr", lib.loc="~/R/win-library/3.3")
library("jsonlite", lib.loc="~/R/win-library/3.3")
library("XML", lib.loc="~/R/win-library/3.3")
library("plyr", lib.loc="~/R/win-library/3.3")
library("dplyr", lib.loc="~/R/win-library/3.3")


setwd("~/DataScienceResearchInitiative")



#http://dblp.org/search/publ/api for publication queries

url<-"http://dblp.org/"
path<-"search/publ/api"

# Parameter	Description	Default	Example
# q The query string to search for, as described on a separate page.		...?q=test+search
# format The result format of the search. Recognized values are "xml", "json", and "jsonp".	xml	...?q=test&format=json
# h Maximum number of search results (hits) to return. For bandwidth reasons, this number is capped at 1000.	30	...?q=test&h=100
# f The first hit in the numbered sequence of search results (starting with 0) to return. In combination with the h parameter, this parameter can be used for pagination of search results.	0	...?q=test&h=100&f=300
# c Maximum number of completion terms (see below) to return. For bandwidth reasons, this number is capped at 1000.	10	...?q=test&c=0

raw.result<- GET("http://dblp.org/search/publ/api?q=wrangl")

this.raw.content <- rawToChar(raw.result$content)


#http://rpubs.com/jsmanij/131030
this.content.list<-xmlToList(this.raw.content)

this.content.frame<- ldply(this.content.list$hits, data.frame)


#update to be sure to use the correct field names - except for author because still need to combine later
#two word ones have to be made into one word - for R - have to edit later
#ReferenceType has to be first to import multiple types in one file others order doesn't matter
content.frame3<- data.frame(ReferenceType = this.content.frame$info.type,
                            Title = this.content.frame$info.title, author = this.content.frame$info.authors.author,
                            author1 = this.content.frame$info.authors.author.1, 
                            author.2 = this.content.frame$info.authors.author.2, 
                            author.3 = this.content.frame$info.authors.author.3, 
                            author4 = this.content.frame$info.authors.author.4, 
                            author5 = this.content.frame$info.authors.author.5, 
                            author6 = this.content.frame$info.authors.author.6, 
                            SecondaryTitle = this.content.frame$info.venue, 
                            Pages = this.content.frame$info.pages, Year = this.content.frame$info.year, 
                             URL = this.content.frame$info.url, 
                            Volume = this.content.frame$info.volume, Number = this.content.frame$info.number, 
                            SecondaryAuthor = this.content.frame$info.author, 
                            Publisher = this.content.frame$info.publisher)
content.frame3<-distinct(content.frame3)


#want to get all authors together and get it basically in the format for TR. 
# first get all authors together separated by ; 
# http://stackoverflow.com/questions/6308933/r-concatenate-row-wise-across-specific-columns-of-dataframe
# example:  data <- within(data,  id <- paste(F, E, D, C, sep="")

content.frame4<- within(content.frame3, Author<- paste(author,author1,author.2, author.3, author4, author5, author6, sep="; " ))

# http://stackoverflow.com/questions/22854112/how-to-skip-a-paste-argument-when-its-value-is-na-in-r
content.frame4$Author<-gsub("NA; ","",content.frame4$Author)

content.frame4$Author<-gsub("NA$","",content.frame4$Author)


#remove NA from other fields

content.frame4[is.na(content.frame4)]<-""

#now drop unwanted columns using df <- subset(df, select = -c(a,c) )  from http://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name

content.frame5<-subset(content.frame4, select = -c(author,author1,author.2, author.3, author4, author5, author6))


#add in a gsub for the correct reference types
content.frame5$ReferenceType<-gsub("Conference and Workshop Papers","Conference Paper", content.frame5$ReferenceType)
content.frame5$ReferenceType<-gsub("Parts in Books or Collections","Book Section", content.frame5$ReferenceType)
content.frame5$ReferenceType<-gsub("Books and Theses","Book", content.frame5$ReferenceType)
content.frame5$ReferenceType<-gsub("Journal Articles","Journal Article", content.frame5$ReferenceType)


#need tab delimited no rownames and update column names to have the necessary spaces

correctnames<- c("Reference Type","Title", "Secondary Title", "Pages", "Year",  "URL", "Volume", "Number", "Secondary Author", "Publisher", "Author")

# if only one type of reference specify at top *Generic to top of file also add a vector of correct column names
#write("*Generic","dblptestnew.txt")
#write.table(content.frame5,"dblptestnew.txt",append = T, quote=F,sep = "\t",row.names = F,col.names=correctnames, fileEncoding = "UTF-8")

#if multiple types use this one
write.table(content.frame5,"dblp30wrangl.txt", quote=F,sep = "\t",row.names = F,col.names=correctnames, fileEncoding = "UTF-8")

(this is also on Git because WP keeps messing up the code)

After you have this file, import into EndNote using the boilerplate tab delimited, with UTF-8 translation.

4 responses so far

Unpacking Societies Publishing With For Profit Companies

(by Christina Pikas) Aug 06 2016

This week, Jon Tennant went off on a riff on Wiley and the poor experience he had with a particular journal published for a society by Wiley.

First - I'm not affiliated and so very much not endorsing any companies, etc.

Second - I'm on record saying some things are worth paying for and I still feel that way.

I've reviewed for a Wiley-published society journal but not published with one. The ScholarOne interface is like whoa, yuck, but that is, by the way, actually a TR product. Any interactions with the editorial staff have been very professional and pleasant.

I've also been helping a colleague navigate ScholarOne to submit to a Taylor and Francis journal. It has been more than a year and we're still going back and forth with them. E-mails to the editor go unanswered. One reviewer was just like "this isn't science"  and doesn't do any more reviewing. The other has provided detailed feedback which the authors have appreciated.

Over the years, I've seen plenty of organizations think they can just do it all themselves. Why, though, should they not outsource to vendors who already have set-ups? I mean, OJS is just ugly. Free CMS are plentiful, but just because you can put articles online for cheap doesn't mean that they'll work with the rest of the ecosystem.

From what I can tell about what Tennant said, his real problem is with the society and the editors, not with the platform.

The other think to think about is if the society had to pay the intermediate vendors themselves (Atypon, etc) and manage those relationships, would that really be cheaper than an all-in-one package? Maybe? Not sure.

Remember, too, that journals are sometimes expensive because the society sees them as a revenue stream so they can pay expensive executives and lobbyists and maybe a scholarship here or there.

If you're part of a society trying to make the decision, you'll likely have the numbers to help - but I don't think the decision is as obvious as everyone thinks.

No responses yet

Using spiffy WordPress themes in an IE environment

(by Christina Pikas) Jul 22 2016

At MPOW we apparently have "compatibility mode" on by default by group policy. So this disables all the cool HTML5 and does weird things in general with a lot of web pages. If you go to the WordPress plugins, there are a few to show nasty messages to visitors that they have to change or update but that's just super unhelpful for those many visitors who don't actually have a choice.

Anyhoo... I pieced this together from a few different sites... If you go into network dashboard to themes to edit the theme and find the header.php file (back it up first just in case).

Then under <head> make the next line:

<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

and update... works. This tells IE not to show compatibility view.

No responses yet

The Dissertation is Now Available for Your Enjoyment

(by Christina Pikas) Jun 30 2016

Or, you know, for bedtime reading.

THE ROLE OF NEW INFORMATION AND COMMUNICATION TECHNOLOGIES (ICTS) IN INFORMATION AND COMMUNICATION IN SCIENCE.
A CONCEPTUAL FRAMEWORK AND EMPIRICAL STUDY
Christina K. Pikas, PhD, 2016
http://hdl.handle.net/1903/18219
Problem
This dissertation presents a literature-based framework for communication in science (with the elements partners, purposes, message, and channel), which it then applies in and amends through an empirical study of how geoscientists use two social computing technologies (SCTs), blogging and Twitter (both general use and tweeting from conferences). How are these technologies used and what value do scientists derive from them?
Method The empirical part used a two-pronged qualitative study, using (1) purposive samples of ~400 blog posts and ~1000 tweets and (2) a purposive sample of 8 geoscientist interviews. Blog posts, tweets, and interviews were coded using the framework, adding new codes as needed. The results were aggregated into 8 geoscientist case studies, and general patterns were derived through cross-case analysis.
Results A detailed picture of how geoscientists use blogs and Twitter emerged, including a number of new functions not served by traditional channels. Some highlights: Geoscientists use SCTs for communication among themselves as well as with the public. Blogs serve persuasion and personal knowledge management; Twitter often amplifies the signal of traditional communications such as journal articles. Blogs include tutorials for peers, reviews of basic science concepts, and book reviews. Twitter includes links to readings, requests for assistance, and discussions of politics and religion. Twitter at conferences provides live coverage of sessions. Conclusions Both blogs and Twitter are routine parts of scientists' communication toolbox, blogs for in-depth, well-prepared essays, Twitter for faster and broader interactions. Both have important roles in supporting community building, mentoring, and learning and teaching. The Framework of Communication in Science was a useful tool in studying these two SCTs in this domain. The results should encourage science administrators to facilitate SCT use of scientists in their organization and information providers to search SCT documents as an important source of information.
20160520_160957

One response so far

Parsing citations for dabblers

(by Christina Pikas) Jun 26 2016

Warning! This post is more about questions than answers!

Wouldn't it be nice to be able to grab references from the end of an article - say the article is in pdf, even - and have them in a usable tagged format? I am sure not the only one to consider this. In fact, everyone seems to do it. CiteSeer, ACM digital library, and others.  Deborah Fitchett had success figuring this out.

My incentive is a bit different. I'm looking at a pile of proposals and I want to know what citations they have in common. Everyone cites themselves, of course, but we think there are several schools of thought that we should be able to identify.

My general thought was

  1. extract the bibliography
  2. parse
  3. label each citation with firstauthorlastnamepubyear - even if there are multiple works by a single author, I think should be good enough? pretty rare to volte-face mid year?
  4. Make a matrix proposers x citation labels
  5. Graph and profit

Several ways to extract the bibliography. I realize now that I probably should have done something with beautiful soup or something and even if not, some tools actually take in a document and find the bibliography. Anyway, I have them.

Now for parsing here is a list of ones that may be around and usable in some fashion (this is a helpful listing)

  • ParsCit - this seems really to be the most popular and the most successful
  • FreeCite - a web service from Brown University libraries
  • Using CrossRef
  • ParaCite / ParaTools from Southampton - perl modules
  • eta: AnyStyle.io - I tried the web interface and it worked pretty well

ParsCit is the most popular so I thought it would give it a go. The page is not terribly hopeful about running it on Windows. Well... so I did request and receive an Ubuntu VM to play with... hoo-boy, the instructions are fairly off-putting and really sort of condescending (if you know how to work in Unix, this is trivial).

So now, instead, I'm playing with using the Brown library service and RCurl to see what I can do. Of course, then I have to deal with what I get back. Meh.

If I get it all figured out, I'll be sure to report back.

 

Edited: Richard Urban reminded me about AnyStyle.io so I added it. I think I will try to get an API key to try to get to it through R using RCurl. Because of time constraints... right now I'm just extracting the network using Excel 🙁

No responses yet

Older posts »