Archive for the 'social computing technologies' category

The #agu12 and #agu2012 Twitter archive

I showed a graph of the agu10 archive here, and more recently the agu11/2011 archive here, and now for the agu12/2012 archive. See the 2011 post for the exact methods used to get the data and to clean it.

#agu12 and #agu2012 largest component, nodes sized by degree

#agu12 and #agu2012 largest component, nodes sized by degree

agu12 and 2012 other components no iso sized by degree n1294

#agu12 and #agu2012 other components, no isolates, nodes sized by degree

I will have to review methods to show this, but from appearances, the networks are becoming more like hairballs. In the first year, half the people were connected to theAGU and the other half were connected to NASA, but very few were connected to both. The other prominent nodes were pretty much all institutional accounts. In 2011, that started to decrease and now in 2012 you can't really see that division at all. There are the top three nodes - two the same plus a NASA robotic mission - but then there's a large second group with degrees (connections to others) around 40-80 (combined indegree and outdegree) of individual scientists.

2 responses so far

An image of the #agu2011, #agu11 Twitter archive

A loooong time ago, I  showed the agu10 archive as a graph, here's the same for the combination of agu11 and agu2011. I mentioned already about the upper/lower case issues (excel is oblivious but my graphing program cares) - this is all lower case (I first tried to correct but kept missing things so I just used Excel's =LOWER()). I also discussed how I got the data. I'm going to have to probably go back and do this for 2010 if I really want equivalent images because 1) I only kept the first @ (this has all the @) 2) I don't believe I did both 2010 and 10 so I probably missed some. For this image I did a little bit of correcting. One twitter name spelled wrong and quite a few people using the_agu or agu instead of theagu. I also took out things that were like @10am or @ the convention center.

I made this graph by taking my excel spreadsheet that was nicely username first@ second@ .... and copying that into Ucinet's dl editor and saving as nodelist1. Then I visualized and did basic analysis in NetDraw.

agu2011 and agu11 largest component, sized by degree

agu2011 and agu11 largest component, sized by degree

The largest component is 559 nodes of 740 and this time you don't see that breakdown where the people who tweeted @NASA didn't tweet @ theAGU. There were 119 isolates and other components with 2,3, and 10 nodes:

Other components, sized by degree (no isolates)

Other components, sized by degree (no isolates)

eta: oh yeah, one other little fix. I took out random punctuation at the end of user names like hi @cpikas! or hey @cpikas: or  well you get the idea

Comments are off for this post

New, now scientists can use blogs to talk to other scientists about science!

I collect articles on scientists using blogs and twitter. Mostly because it’s relevant to my dissertation, but also because I find them interesting. You can see a listing here: (used to be displayed on my UM page, but that broke in the transition).

So one of these articles that I saw tweeted by about five people at the same time is Wolinsky, H. (2011). More than a blog. EMBO reports 12, 1102 - 1105. doi:10.1038/embor.2011.201 .

Of course it starts with the arsenic life discussion. It talks about the immediacy of the blog reaction and the tone of the discussion on the blogs.  Overall a nice article.

I think the subtitle of the piece is unfair. It acts like the title of this post when the article itself is more about where blogs have evolved to right now. There are a lot of differing experiences with blogs and differing uses, some of which have always been talking shop.

4 responses so far

Solution to my Twitter API - twitterR issues

With lots of help from Bob O’Hara (thank you!), I was able to solve my problems. I am looking at the tweets around #AGU10 but it occurred to me that I wanted to know what other tweets the AGU twitterers were sending while at the meeting because some might not have had the hashtag.

Here goes:

# Get the timeline
person <- userTimeline("person",n=500)

# Check to see how many you got

# Check to see if that is far enough back

# Get the time it was tweeted
Time = sapply(person,function(lst) lst$getCreated() )

# Get screen name
SN = sapply(person,function(lst) lst$getScreenName() )

# Get any reply to screen names
Rep2SN = sapply(person,function(lst) lst$getReplyToSN())

# Get the text
Text = sapply(person,function(lst) lst$getText())

# fix the date from number of seconds to a human readable format
TimeN <- as.POSIXct(Time,origin="1970-01-01", tz="UTC")

# replace the blanks with NA <- sapply(Rep2SN, function(str) ifelse(length(str)==0, NA, str))

# make it into a matrix
Data.person <- data.frame(TimeN=TimeN, SN=SN,, Text=Text)

# save it out to csv
write.csv(Data.person, file="person.csv")


So I did this by finding and replacing person with the screen name in a text editor and pasting that into the script window in Rcmdr. I found that 500 was rarely enough. Some I had to request up to 3200 tweets, which is the maximum. I had to skip one person because 3200 didn’t get me back to December. It’s also worth noting the length() step. It turns out that when you ask for 500 you sometimes get 550 and sometimes get 450 or anywhere in between and it’s not because there aren’t any more. You may also wonder why I wrote the whole thing out to a csv file. I could have had a step to cut out the more recent and older tweets to have just the set there for more operations within R. I need to actually do qualitative content analysis on the tweets and I plan to do that in NVIVO9.

I didn’t do this for all 860, either. I did it for the 30 or so who tweeted 15 or more times with the hashtag. I might expand that to 10 or more (17 more people). Also, I didn’t keep the organizational accounts (like theAGU).

With that said, it’s very tempting to paste all of these data frames together, remove the text and do the social network analysis using iGraph. Even cooler would be to show an automated display of how the social network changes over time. Are there new connections formed at the meeting (I hope so)? Do the connections formed at the meeting continue afterward? If I succumb to the temptation, I’ll let you know. There’s also the the textmining package and plugin for Rcmdr. This post gives an idea of what can be done with that.

2 responses so far

My ongoing struggle with the Twitter API, R, … copy paste

I’m posting this in hopes that someone with experience in any/all of the above or maybe Perl, can point out that I’m doing something stupid or have overlooked something obvious. If nothing else, you might read this to see what not to try.

Here’s the issue: it’s totally obvious that I need to look at the other tweets that were sent by #agu10 tweeters (the ones not marked with the hash tag) if I want to understand how Twitter was used at the meeting. But it’s now five months later and there are 860 of them (although I would be fine with looking at the most prolific non-institutional tweeters).

I first looked at the Twitter API and I tried just adding terms to URLs and got the recent timelines for a user at a time but I couldn’t see a way to get a user’s timeline for a set period of time (the conference time period +a week on each end, or so).

I asked two experts and they both said that you couldn’t combine the user timeline with a time period.

Darn. So my next idea was to see if I could actually access someone’s timeline that far back through the regular interface. I tried one of the more prolific tweeters and I could. Ok, so if I can pull down all of their tweets, then I could pick out the ones I wanted. Or, even better, I could also look at the evolution of the social network over time. Did people meet at the meeting and then continue to tweet at each other or are these people only connected during the actual meeting?  Did the network exist in the same way before the meeting?

I was looking for ways to automate this a bit and I noticed that there were things already built for Perl and for R. I used Perl with a lot of handholding to get the commenter network for an earlier paper and I used R for both that same article and in lieu of STATA for my second semester of stats. I’m not completely comfortable with either one and I don’t always find the help helpful. I decided to start with R.

The main package is twitteR by Jeff Gentry. I updated my installation of R and installed and loaded that package and the dependencies. First thing I did was to get my own standard timeline:

testtweets <- userTimeline("cpikas")

Then I typed out the first few to see what I got (like when you’re using DIALOG)


And I saw my tweets in the format:


[1]”username: text”

I checked the length of that and got 18 – the current timeline was 18 items. I tried the same thing substituting user id but that didn’t work. So then I tried to retrieve 500 items and that worked fine, too.

testlonger <- userTimeline ("cpikas", n=500)

Great. Now, let me see the dates so I can cut off the ones I want. Hm. Ok, let’s see, how to get the other columns. What type of object is this anyhow? The manual is no help. I tried some things with object$field. No joy. Tried to edit. no joy – it was upset about the < in the image url. And it was also telling me that the object was of type S4. The manual said it wasn’t but I can’t argue if that’s what it’s reading. I somehow figured out it was a list. I tried object name [[1]][2]  - null. Then I eventually tried


Hrumph. It says 1 slot. So as far as i can tell, it’s a deprecated object type and it didn’t retrieve or keep all of the other information needed to narrow by date.

When googling around, I ran across this entry by Heuristic Andrew on text mining twitter data with R. I didn’t try his method with the xml package yet (may try that). I did try the package that was listed in the comments tm.plugin.webcorpus by Mario Annau. That does get the whole tweet and put the things in slots the right way (object$author), but it looks like you can only do a word search. Oh wait, this just worked:

testTM <- getMeta.twitter('from:cpikas')

But that’s supposed to default to 100 per page, 100 things returned and it only returned 7 for me. I guess the next thing to try is the XML version unless someone reading this has a better idea?

edit: forgot the copy paste. When I tried to just look at the tweets i wanted on screen and then copy them into a text document it crashed firefox. who knows why

18 responses so far

Blogs are not dead yet

Various assorted pundits have been heralding the death of the blog as a science communication medium for at least five years, probably longer. Blogs aren’t dead, indeed, as far as I can tell, they are now in a revival period in which their true utility and value is becoming more obvious.

This blog post was prompted by a post on Scholarly Kitchen in which the blogging scientist (or science-trained publisher) blogs about how scientists don’t blog (again). David Crotty titled his post: Not With A Bang: The First Wave of Science 2.0 Slowly Whimpers to an End. Crotty views the attempted monetization of the science blogosphere as the crest of the first wave. He discusses several examples of for-profit companies that exuberantly jumped into the blogosphere and other science 2.0 things but have since pulled back.  I would assert that the attempted monetization and commercialization of science 2.0 is external to the movement and really a distraction from the slow growth phase of the innovation adoption curve.

First, all of the bloggers now on a for-profit host started on, blogger, or some similar service. They garnered enough interest to be attractive to a company that hoped to make money on page views. Many of the early adopters moved over from updating static websites, keeping newsletters, participating on newsgroups, or participating on bulletin boards. They may have continued to participate in these platforms, but saved longer discussions for their blogs. Otherwise, they might have used their blogs to re-share links they would normally have put on a static website or on the young delicious but that weren’t getting enough visibility. This was the first wave of pioneers.

The idea that a media company could get inexpensive talent by mining the blogosphere came later. In the beginning the primary for-profit (or for loss, unfortunately) was ScienceBlogs. Even at its height, ScienceBlogs was never more than a tiny part of the science blogosphere. Its limited size made it more exclusive and more watched. Others who did not know about the rich online life of scientists saw ScienceBlogs as the entire science blogosphere. Seed Media told a good story and made it look profitable so others wanted to get in. I’m not sure about Nature, but I’m sure they were clear that supporting blogs would not be a profitable effort. I think the goal for them was to support science and to get scientists to spend more of their time online looking at Nature Publishing sites. It’s not important.

When some of the shine wore off, and some of the bloggers left, the rest of the blogosphere got more attention. I still feel that the rest of the blogosphere doesn’t get the attention it deserves, but as with everything people do, there’s a long tail.

In the past few months, some of the long-time bloggers went into a blogging funk (including yours truly). At the same time, additional scientists started blogs. Some bloggers went on hiatus, some quit, but others started, and some who quit earlier came back. Societies and non-profits have stepped up to support science blogging. This is a great idea as the purpose of the societies is to support science communication in their subject area.

Some who have discussed the death of blogs originally said that wikis would take over. If you’ve used a wiki, you know they are very good for certain things, but there’s almost no overlap with what blogs are good for. Likewise, many people thought Twitter would replace blogs. Using twitter can be an art form- the concise nuggets of information or questions in under 140 characters. Recent it’s become more and more clear that the long form not only still has value, but is still needed. It’s needed to provide context and to tell the whole story.

What about the lack of or surfeit of journalism-trained bloggers. Which is it? Does it matter? The science blogosphere has always been made up of practicing scientists, people working in some area adjacent to science with some level of science training (like librarians), and non-scientists who are interested in science. There are bloggers in each of these varieties who communicate well and are good at telling a story. It’s very welcome that a lot of the very talented science journalists have taken up blogging. For them it’s not a longer form but often a shorter form. I don’t think there are too many or not enough nor do I think that they are any more important or valuable than the scientists who blog. Nor do I think that all members of the science blogosphere should have journalism training or strive to journalistic standards.  We could all stand to write better, but we’re all writers. Scientists have to write for their profession so blogging really isn’t that much of a stretch.

As for the question of culture and technology. They co-evolve. Does the science blogosphere change science or science culture? Does science culture determine what technologies will be used and how? Yes. Both. All the time. Is there a lot of inertia? Oh yes.

6 responses so far

Post I’d like to write: trolls by discipline

I noted in both my qualitative study(pdf) of science blogging and my social network analysis study (pdf) that there are more trolls in some areas of science blogs than others, and they’re pretty detectable by looking at the patterns of links.

Anyhoo, seems like although some fields get more trolls than others, each field has a unique set of trolls with different approaches. Now I’m not talking about people who actually have substantive arguments that further the conversation, I’m talking about obnoxious people who hijack the conversation.

Actually, with that said, it would be kind of interesting to have a typology of the various flavors of pseudo science activists and what not who cause hate and discontent in blog comments. You have the anti-feminists, the anti vaxxers, etc.

Not really a troll, but this post about the sorts of e-mail meteorologists get got me started on this. Sigh.

11 responses so far

Another little teaser on AGU10 tweets: NASA

I’m just starting to analyze the tweets from the American Geophysical Union Fall Meeting from December. These are just the ones with the hashtag #agu10 that were kept in a TwapperKeeper account. The first of these teasers is in the previous post at:


We saw in that last picture how many tweets were tweeted to or mentioning theAGU and NASA. I was wondering what the circumstances were. Turns out that 257 (at least) of the 264 tweets at or mentioning NASA were re-tweets of their press release tweets. In fact, there wasn’t a ton of diversity in what was retweeted.

Number of tweets Identifying phrase
113 “loss of ice” 
34 “April Mexico quake”
31 “mars opportunity”
27 “some big NASA science”
19 “creeping faults in Bay area”
18 “more NASA science”
8 “electric atmosphere” video
7 “how hard are we pushing”

Everyone was retweeting the same few stories. I suspect from the graph that these folks weren’t tweeting anything else from the meeting. Were they even there or are they just NASA fans?

Comments are off for this post

An early image of the AGU10 twitter archive

I used TwapperKeeper to capture the AGU10 twitter archive. TwapperKeeper via Summarizr gives some general stats but I was curious more about the connections. At first I thought I could take the from to columns directly from the export and put them into an SNA package, but alas, the to field only covered tweets that started with @. So that leaves out all of the RT@ messages as well as the mentions where the @ is somewhere embedded.  I was despairing a little bit about it, and even got ready to pull out the Perl and regex, but my dear husband was like why not do text to columns at the @ symbol. Well, why not indeed?  So this dataset only has one @ in it. If more than one person was @-ed, only the first is pulled out right now. I might do something different later.

Anyhow, so I took that and I pasted it into NodeXL – an add-in for Excel 2007 that does SNA. But I was sort of having trouble working the visualization – mostly my inexperience probably. So I exported from there in DL format, imported into UCInet and then opened in NetDraw. There’s lots to see and do yet, but I thought this little bit was interesting:

agu10 mentions replies largest component 781 sized in degreeThis is the same license as the rest of my blog (cc-by), but it’s just a first pass so you might want to keep that in mind if you want to redistribute.

This is the largest component (components are pieces of the graph that are connected to each other but not the rest of the graph). It has 781 nodes. The rest of the components are like 3-5 nodes on average. The nodes are sized by inDegree (how many people tweeted @ them with the agu10 hashtag). What I find interesting about this is the role of institutional bloggers. Only one of the labels is clear but the two largest nodes are NASA, top, and theAGU, bottom. The medium sized one above NASA is NASAjpl. It’s interesting about the institutional bloggers, but also that they really seem to cluster in two camps. Not that many people tweeted @ both.

Certainly, I’m curious about what’s in common with the people in one camp or the other and what the content of the messages is. But this is an extremely early look.

UPDATE: Upon further inspection it became clear that there was an issue with upper and lower case - Twitter isn't sensitive, but my SNA packages are. Nothing I've said above really changes, there are just additional nodes connected to NASA and theAGU.

One response so far

Blogs and journalism, again.

Jan 19 2011 Published by under social computing technologies

I’ve been blogging since 2003 and I attend my first blogging un-conference in the Spring of 2004 (this search gets you my posts on BloggerCon II, scroll down, this archive capture of an early version of the schedule shows the amazing speakers). Blogging has been around since the mid-1990s, but it really picked up in the early 2000s.

From the very beginning there has been this tension between bloggers and mainstream media. Some bloggers have always seen themselves as local journalists or journalists who are faster or something. There have always been discussions of blogging ethics and blogging methods. There have been discussions of how bloggers are better than journalists and vice versa.

But really, this is a very narrow and really myopic view. At the same time journalists were starting to use blogs, and non-journalists were starting to use blogs for journalism, knitters were starting to use blogs to describe their projects and build their communities. Mommy bloggers were starting to use blogs to describe their daily lives. Food bloggers – in my memory – might have been slightly later. The biblioblogosphere – the group of librarians using blogs – started well before I started blogging. I learned how to blog at a conference in 2003 from other librarians (put on by SLA, of course!). Librarians have always discussed technology in the library, service to patrons, and innovation. That’s not new. That’s not something someone had to tell us to do!

My point in saying this is that blogs are a somewhat generic format that is easily adaptable to many different types of communication. The reverse chronological organization, RSS feeds, the ability to comment on individual posts, and the searchable archive of thoughts are all quite attractive. The way you can annotate images works well for knitters and foodies.

Many, many bloggers have no desire to learn and uphold the finest journalistic standards – even if they do want to communicate with the public. Another early idea was authenticity and being genuine. To me, that’s more important in the science blogosphere than trying to turn scientists into something they may not want to be.

Now, am I saying there are no best practices? That some people don’t write better than others or that some people (like yours truly) can’t use some help in writing better?  Of course not. Don’t be silly. The truth is and has always been that you need to communicate in a way that is appropriate for your desired audience. If you want to be picked up and quoted by major media outlets, it would probably help to follow those journalism standards. If you are writing to keep track of articles you’ve read so you can find them later – do whatever makes you happy. If you want to communicate within science, then do your fancy scientist thing. If you want to communicate to a broader audience, there are tips to be had for this.

And no, my twitter friends and fellow ischool grad students – blogging is not journalism.

Comments are off for this post

« Newer posts Older posts »