I used TwapperKeeper to capture the AGU10 twitter archive. TwapperKeeper via Summarizr gives some general stats but I was curious more about the connections. At first I thought I could take the from to columns directly from the export and put them into an SNA package, but alas, the to field only covered tweets that started with @. So that leaves out all of the RT@ messages as well as the mentions where the @ is somewhere embedded. I was despairing a little bit about it, and even got ready to pull out the Perl and regex, but my dear husband was like why not do text to columns at the @ symbol. Well, why not indeed? So this dataset only has one @ in it. If more than one person was @-ed, only the first is pulled out right now. I might do something different later.
Anyhow, so I took that and I pasted it into NodeXL – an add-in for Excel 2007 that does SNA. But I was sort of having trouble working the visualization – mostly my inexperience probably. So I exported from there in DL format, imported into UCInet and then opened in NetDraw. There’s lots to see and do yet, but I thought this little bit was interesting:
This is the largest component (components are pieces of the graph that are connected to each other but not the rest of the graph). It has 781 nodes. The rest of the components are like 3-5 nodes on average. The nodes are sized by inDegree (how many people tweeted @ them with the agu10 hashtag). What I find interesting about this is the role of institutional bloggers. Only one of the labels is clear but the two largest nodes are NASA, top, and theAGU, bottom. The medium sized one above NASA is NASAjpl. It’s interesting about the institutional bloggers, but also that they really seem to cluster in two camps. Not that many people tweeted @ both.
Certainly, I’m curious about what’s in common with the people in one camp or the other and what the content of the messages is. But this is an extremely early look.
UPDATE: Upon further inspection it became clear that there was an issue with upper and lower case - Twitter isn't sensitive, but my SNA packages are. Nothing I've said above really changes, there are just additional nodes connected to NASA and theAGU.