OAuth in TwitteR much easier now, whew!

(by Christina Pikas) Mar 18 2015

Not like I should be messing with this at this point, but I wanted to retrieve a tweet to provide evidence for a point. Anyway, instead of the like 50 step process in the past, you now follow the instructions in the TwitterR readme: http://cran.r-project.org/web/packages/twitteR/README.html with the exception of you now put your access token and secret in the *single command* now, too, like so:

setup_twitter_oauth(consumer_key, consumer_secret, access_token=NULL, access_secret=NULL)

Then you can just search or whatever. Wow!

Very nice. How much time did I spend playing with the old method?

No responses yet

Do too - I'll show YOU!

(by Christina Pikas) Mar 15 2015

Lookin' for some lit as one does when one is supposed to be writing instead of adding to the impossible list of things to double back to add to the lit review... Via who-cited-who and Scholar, ended up on a TandF page. Looked interesting - right click, reload through proxy for my place of work. It sneers - "sorry you do not have access to this article" - access options include paying $40 for the article. Um. No.  LibX has kindly highlighted the doi... clicked... got to my beautifully customized SFX page (with Umlaut) and it's full text on a major aggregator. Take that you! Ha!

And, this is probably even better than seeing it at the publisher, because our custom FindIt page tells me the article has been cited 23 times (oh well maybe not I see that TandF does offer that info, too).

Sadly though, I'll bet hardly anyone at my place of work would have thought to take this path.

 

Edited to add: moments later looking at JSTOR. They kindly ask if I think I should have access... then let me pick my institution and do a shibboleth login et voila. (price would have been $14 without).

No responses yet

Polar and Ellipsoid Graphs in iGraph in R

(by Christina Pikas) Mar 12 2015

I'm still working to do some additional graphs for the project mentioned in this earlier post. It was too crowded with the Fruchterman Reingold layout, so my customer suggested we do a circular layout with one category in the center and the remaining on the outer ring. I said sure! But when I went to do it, I found only star layout (one in the center) and ring layout. No polar layout. I tried a few things but finally broke down and asked. Quick perfect answer on StackOverflow (as often happens).

That led to this:

Polar Layout

But hey, still pretty jammed up. So what about an ellipse? Sure!

What's that equation again?

 \frac {x^2}{a^2} + \frac {y^2}{b^2} =1

 

But that's a hard way to do it when I need x and y values in a matrix. This looks better:

x = a \cos(\theta) , y=b \sin(\theta)

And this is how I did it.

ellip.layout <- function(a,b, theta) {
cbind(a*cos(theta), -b*sin(theta))
}

systems <- which(V(g)$category == "System")
comp <- which(V(g)$category != "System")

a<- ifelse(V(g)$category == "System",4,5)
b<- ifelse(V(g)$category == "System",0.5,1)

theta <- rep.int(0, vcount(g)) #creates a blank vector
theta[systems] <- (1:length(systems)-1) * 2 * pi / length(systems)
theta[comp] <- (1:length(comp)-1) * 2 * pi / length(comp)

layout<- ellip.layout(a,b,theta)

plot.igraph(g, layout=layout, asp=0)

Originally I was getting the outer ring to be a circle anyway, but then I asked the mailing list and it was a matter of setting asp (aspect ratio) to 0.

Here's where I ended up:

EllipseETA: If you do labels, there's a neat trick to make them always outside the circle. See here: https://gist.github.com/kjhealy/834774

No responses yet

Post I wish I had time to write: Scientific meetings and motherhood

(by Christina Pikas) Feb 24 2015

I was reading Potnia's new post on meetings - why to go to them - and nodding my head vigorously (ouch) and connecting that to the part of the dissertation I'm writing now on tweeting meetings and the research over the years on how scientific meetings work and contribute...

and I got very sad. I'm a real extrovert and a magpie of all sorts of different kinds of research, but I can't justify spending my limited time reading articles that aren't pretty directly relevant to my job or my dissertation. When I went to bunches of meetings, I could soak a million little tidbits up, meet the people doing the work, browse lots of posters and talk to their authors. It's really a very efficient way to see what's up with a field.

and now... I haven't been to a conference since I was in my first trimester with my twins :(   Sure, I've listened in to some webinars and followed some tweets. It's not enough.

Would childcare at a venue help?  I don't know... I'd still have to get them there, I'd have to trust the childcare (what if I got there and checked them out and didn't like what I saw?), and I'm paying for childcare at home even when I go and money is super tight now with my income being the only one in our household for more than a year.  I thought about bringing my sister along and then we could see the sights together outside of hours. My work would pay my travel and my room and so I'd just have to pay her travel and everyone's food. But I can't really even swing that right now....

 

So yeah... at least there's twitter. The post I'd like to write actually cites references and what not.

And I'm only the 10 millionth person to have this issue this year so I  know I'm not a special snowflake but that doesn't mean I can't still bitch about it.

2 responses so far

Exporting high resolution graphs from RStudio

(by Christina Pikas) Feb 12 2015

This may not be obvious until you look into it but apparently the default export from RStudio -  if you use the nifty little tool in plots tab on the lower right hand side -  is 72dpi. This is fine for showing on web pages, typically, but is not enough for print. Particularly if you're submitting to a journal or something like that. There's lots of advice, but I found it somewhat confusing.

RStudio Interface for Windows from RStudio.com

RStudio Interface for Windows from RStudio.com

I found these posts helpful:

  • http://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html
  • https://danieljhocking.wordpress.com/2013/03/12/high-resolution-figures-in-r/
  • http://www.r-bloggers.com/exporting-nice-plots-in-r/

I think someone I was reading just got out of RStudio and did his work in the standard interface. Really, there's no need for that. I also read somewhere that Cairo is not really used any more? There is a way to export to pdf from RStudio and check a box to use Cairo...

Here's what I did.

CairoPDF(file="something.pdf", width=11, height=8.5, family="Helvetica", pointsize=11)

set.seed(1337)

plot.igraph(g, layout=layout.fruchterman.reingold, edge.arrow.size=0.4, edge.color="black", vertex.size=V(g)$degree, vertex.label.dist=V(g)$vertex.label.dist, vertex.label.color="black", vertex.label.family="sans",edge.curved=TRUE, vertex.label.cex=V(g)$vertex.label.cex, edge.lty=E(g)$edge.lty, vertex.frame.color=V(g)$frame.color)

dev.off()

A couple of notes:

  • I found I needed to increase the arrowhead size
  • I needed to decrease the font size
  • I needed to set a seed so I was only changing one thing at a time as I experimented
  • When I did png, my dotted lines didn't look so dotted anymore. I didn't feel like messing with that further


Cairo(file="something.png", type="png", units="in", width=10, height=7, pointsize=12, dpi=300)

set.seed(1337)

plot.igraph(g, layout=layout.fruchterman.reingold, edge.arrow.size=0.1, edge.color="black", vertex.size=V(g)$degree, vertex.label.dist=V(g)$vertex.label.dist, vertex.label.color="black", vertex.label.family="sans",edge.curved=TRUE, vertex.label.cex=V(g)$vertex.label.cex, edge.lty=E(g)$edge.lty, vertex.frame.color=V(g)$frame.color)

dev.off()

One response so far

So... um... what if I'm still enjoying it?

(by Christina Pikas) Feb 05 2015

Am I supposed to kind of hate my dissertation topic by now? If I don't, does that mean I'm not working on it hard enough (maybe)? I'm doing it wrong? Maybe it's a phase and it will pass.

Making progress. Learning new stuff from my data. Feeling horribly inadequate when watching tweets fly by from another doctoral student dissertating on how scientists use blogs.... (holy moly how many scientists did she actually interview? hundreds? cray-cray... or am I a hater?)

Working on it every chance I get - taking a morning off every week. Staying up late. .. I will have to add more time off. If only we could afford more childcare!

2 responses so far

Using more of the possible dimensions in a network graph

(by Christina Pikas) Jan 30 2015

When doing bibliometrics, or social network analysis or any kind of network graph, there are only so many different ways to convey information.

  • Size of nodes
  • Shape of nodes (including pictures)
  • Color of nodes
  • Border of nodes (or multiple borders)
  • Labels (node or edge)
  • Edge weight
  • Edge color
  • Arrows
  • Shading areas around/behind nodes
  • Layout or arrangement of nodes

Of these, I almost always size nodes by degree (connections to other nodes), do thickness of lines by their weight, and do some sort of energy or spring layout.

If I do some sort of clustering or community detection or even want to call out components, I'll do that with node color.

My normal things are easy in any package that will graph networks. I was working on a project where we were looking at the maturity of a particular part of an industry. As part of these, we wanted to know if the necessary component systems were available from multiple suppliers and if those suppliers had relationships with different system integrators and if their things were operational or were just for lab or testing purposes.

We could have done a graph for each sub system but they wanted this graph to really just be one slide in a fairly small deck. I tried various approaches in Gephi and NetDraw and wasn't excited. So back to R and iGraph.  In the end (anonymized) :

igraphgraph

Resulting graph - minus any labels.

I used:

  • node shape for if a component or a system integrator
  • color for type of component
  • size for degree
  • line dashed or dotted for if it was in operation or not

I really wanted to show different shapes for each category but igraph only has like 6 default ones and they don't look all that different from each other. NetDraw has more. I tried to use raster images - but I'm on a windows machine and I found all that very confusing.

One unfortunate thing about this graph is that I had to list companies multiple times if they had offerings in multiple categories.

Customer seemed to like it.

I'm not going to take the time to anonymize all the code but here are some key pieces - ask if there's anything I figured out that you don't immediately see how to do.
I started with a spreadsheet (3 of us librarians were adding data)
nodetable tab:
id label category

edgetable tab:
source target yes/no notes

These I imported into gephi (super easy)... and then tried all sorts of stuff... and then exported into graphml
#read in the graph
g<-read.graph("g.graphml", format="graphml")


#shape nodes. these work, but you can't have n/a. so there has to be a default. also, there is an easier way
for (i in 1:101)ifelse(V(g)[i]$Category=='category', V(g)[i]$shape<-'circle', V(g)[i]$shape<-'square')

#color of nodes - a simple number will draw from the palette. see below
for (i in 1:101)if(V(g)[i]$Category=="category"){V(g)[i]$color<-1}

#calculate and keep the degree. i use it again for label placement (not shown) and to bold some labels (not shown)
V(g)$degree<-degree(g, mode="all")

#when I tested the graphing, the isolates were all mixed in and messed up all the labels.
#subgraph to show isolates separately
gi<-induced.subgraph(g,V(g)$degree==0)
gnoni<-induced.subgraph(g,V(g)$degree!=0)

#make dotted lines for not operational
for (i in 1:76) ifelse (E(gnoni)[i]$"operational"=="yes", E(gnoni)[i]$edge.lty<-1,E(gnoni)[i]$edge.lty<-2)

#prettier colors
library("RColorBrewer", lib.loc="~/R/win-library/3.1")
mypalette<-brewer.pal(6,"Paired")
palette(mypalette)

#legend definitions
colors < - c('gray40', 1,2,3,4,5,6)
labels <- vector of categories


#plot graph keep device open
plot.igraph(gnoni, layout=layout.fruchterman.reingold, edge.arrow.size=0.1, edge.color="black", vertex.size=V(gnoni)$degree, vertex.label.dist=V(gnoni)$vertex.label.dist, vertex.label.color="black", vertex.label.family="sans",edge.curved=TRUE, vertex.label.cex=0.8, edge.lty=E(gnoni)$edge.lty)

#put legends on - isolates are just shown as a legend so they are neatly lined up
#could have been done by plotting points

legend("bottomright",legend=labels, fill=colors, border="black", cex=0.7, inset=c(-0.1,0))
legend("topleft", legend=V(gi)$label, pch=19, col=V(gi)$color, cex=0.7, bty="n", y.intersp=0.5)
legend("topright", legend=c("Yes", "No"), lty=c(1,2), cex=0.7,inset=c(-0.02,0)) Continue Reading »

One response so far

Ebook Explosion

(by Christina Pikas) Dec 17 2014

Seems like all the publishers and all the societies are trying to get into the eBook game. The newest announcement is from AAS (using IOP as a publisher). Considering the fact that a lot of these domains are not particularly known for monographs - like Computer Science and ACM's new ebook line - but instead for conference proceedings and journal articles, seems kinda weird.

Someone mentioned that maybe it was due to the ebook aggregator demand driven acquisition plans - but I think it's just the opposite. Many major publishers have jacked up prices (pdf) on EBL and Ebrary recently - all to push libraries in to licensing "big deal" bundles of the entire front list or entire subject categories. And it is super attractive to buy from the publishers because they're often without DRM, PDFs (one big publisher even offers a whole book in a single pdf, most are one pdf per chapter), ways to view online, easily findable using Google and also nice MARC records for adding to the catalog.

The ebook aggregators have nasty DRM. They have concurrent user rules. They have special rules for things that are considered textbooks.  We have to login with our enterprise login (which isn't my lab's day-to-day login) and the data about what books we view is tied to our identities. The new prices end up being as much as 30-40% of the cover price for a 1 day loan. That's right. The customer can look and maybe print a couple of pages for 24 hours and the library is charged a third the cover price of the book.

But for the society and publisher own pages, what seems like a one time purchase has now become yet another subscription. If you buy the 2014 front list will you not feel the pressure to buy the 2015 and 2016 publications?

Aggregators had seemed like some of the answer, but not so much with these prices. We've already mugged all the other budgets for our journal habit so where do these new things come from? The print budget was gone ages ago. Reference budget was also raided.  The ones we've licensed do get used a lot at MPOW.

No responses yet

Continuing value and viability of specialized research databases

(by Christina Pikas) Nov 26 2014

There was an interesting thread yesterday on the PAMnet listserv regarding "core" databases in Mathematics and which could be cut to save money.

One response was that it's better to search full text anyway (I couldn't disagree more).

Ben Wagner expressed concern that Google Scholar was going to drive all of the databases out of business and then Google would abandon the project.

Joe Hourclé posted about ADS - a core database in astro. Fred Stoss posted about PubMed - needs no intro here, surely!

Here's my response.

I think Scopus and WoS are the biggest immediate threats to the smaller domain specific indexes particularly when the largest number of academic users are looking for a few reasonable things and aren't doing the complex queries or needing to be very precise and have very high recall. In my world, I'm like the goalie: by the time they ask me, they've tried Google, they've asked their friends, they've asked their mother*... it's gotten past 10 people without an adequate answer. For these hard questions, I need the power of a good database (like Inspec). But... if you look at quantities and numbers of users... does that justify the huge cost? Maybe? But do our auditors agree? Infrequent big wins vs. day to day common usage?

As Ben has often chronicled, we've shifted money out of every other budget to support our sci/tech journal habit. We've starved the humanities. We've dropped databases. All for more and more expensive journals. Seems like if the content does get paid for out of other budgets via page charges or institutional support for open access publishing, that might make it even more important that libraries have better ways to find the distributed content. But, like Ben, I worry that we'll put these finding tools out of business.

Another observation: two of the "core" databases mentioned, ADS and PubMed, are government supported as a service to the community. The solar physics bibliography is a very specialized resource but is also super important to those researchers. Maybe if building specialty research databases is no longer profitable but there remains a need, the community-built tools will improve/grow/gain support? Maybe they'll be backwards and using technology from 1995, though :)

I'm working with some projects that are actually taking big piles of full text documents and using computational methods to classify using an ontology that's built by subject matter experts (with some advice from a professional taxonomist in my group). The volume/velocity/yadda yadda of the data precludes the careful indexing done by our fancy databases... but this and other projects like it I think show a swing back toward the importance of good indexing and the importance of having domain experts reviewing the classification system.

 

* My mom is a statistician so I might ask her first

 

No responses yet

PyCharm FTW

(by Christina Pikas) Oct 12 2014

Another random Python note. I asked at work again in the Python group of our internal social networking thingy and consensus was that I should try PyCharm as a development environment.

All the stinking tutorials are like use a text editor and command line - and that's what I'd been doing - but with R, RStudio is so fantastic that I thought surely there must be something workable for Python. I had tried the eclipse plugin and I couldn't even get it to run a program and i couldn't figure out what it was doing and ugh.

PyCharm now has a community edition so you don't even have to prove you're a student or pay for it. It's lovely, really. I don't see why I should have to use VI like it's 1991 or beat on something with rocks to see where I'm missing a quote or have the wrong indents. Why not have help? I'm trying to accomplish a task not create art.

I really do have to continue coding and stop playing with Python. Particularly since when I do I end up losing hours of my life when I'm supposed to be sleeping!

2 responses so far

Older posts »