Getting abstracts back from Microsoft Academic's Inverted Index

May 14 2018 Published by under bibliometrics, information analysis

Some time ago, I posted about using Microsoft Academic to fill in missing data from other searches. Jamie and I were going to do a package to wrap the API, but bureaucracy more or less killed our enthusiasm (well, not his, that would be impossible).

Here I am obsessing over a really, really cool bibliometrics project, and have lots of citations missing abstracts. I'm sort of thinking I won't be able to do much with the books even though catalogs seem to have descriptions for a lot of them (happy to take suggestions). I've already looked at using other sources, so I'm back at Academic.

Pulled out my script. Found I lost my keys, retrieved new keys and found there's a new endpoint url, updated that and hit go. ....

Ladies and gentlemen, they moved the abstracts... no more paragraph, you now get an "inverted index." People who studied information retrieval may know what that is, but in this case it's a list of terms with each having a numeric vector of locations the term appears. Stop words are included so "the" might have 20 locations and "sassafras" has 1.

Here it is. Jamie helped with the strategy and the rest comes from lots of searches:

 

library("httr")

library("tm")
library("jsonlite")
library("reshape2")
library("plyr")
library("glue")

setwd("I:/Christina's/")

options(stringsAsFactors = FALSE)



#keys- put yours in:
msakey1<-""
msakey2<-""

#current 05142018 url
#https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate[?expr][&model][&count][&offset][&orderby][&attributes]

apiurl<-"https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?"

#sample search left in
searchexpr<-"Ti='identity as a variable'"
apiattrib<-"Ti,Y,AA.AuN,AA.AfN,C.CN,J.JN,E"

#test on one to see how it works
testcite <- GET(apiurl, 
         query = list(expr = searchexpr,count = 1, attributes = apiattrib), add_headers("Ocp-Apim-Subscription-Key"= msakey1))


#could look for errors first
testcite$status_code

#get the json out into usable format
#comes out raw so need to make into text

testciteContent <- rawToChar(testcite$content)
  
test<-fromJSON(testciteContent)
test$entities$AA
test$entities$AA[[1]]$AuN
#this will get a ; separated vector
paste(test$entities$AA[[1]]$AuN, collapse = ';')

test$entities$AA[[1]]$AfN
test$entities$J$JN
test$entities$Y
test$entities$Ti

###################
#use the following to get an abstract from the inverted index

holdE<-fromJSON(test$entities$E)

testII<-holdE$IA$InvertedIndex
testII %>% do.call(rbind, .)

testII.m<-melt(testII)

testII.m<-unique(testII.m)

arrange(testII.m,value)

ab<-collapse(testII.m$L1, sep=" ")

####################

Tags: ,

2 responses so far

Leave a Reply