Archive for the 'social computing technologies' category

Why FriendFeed Rocked

If you're a librarian or into open access or scholarly communication, at some point you've probably heard of FriendFeed. The service closed today after seven years and it was kind of like the final episode of Cheers or MASH. It had been acquired by Facebook a while ago and development had stopped. Reliability was down. The number of active users was down and had never been anywhere near Facebook even in its prime. There was no way for it to make money - no ads, no premium features, no subscriptions.

With that said, there are a lot of people who are really torn up about them shutting it down. We built a community there - a stay at home mum from Australia, an engineer from Detroit, a software developer from Alberta, several ministers, lots of other neat people, and the LSW. The Library Society of the World is sort of an anti-association. Read Walt's discussion of that in his May 2015 Cites and Insights (pdf)

So why did it work? When I started with it, there were lots of social software things all over - blogs, Twitter, Flickr, del.icio.us... and there were more and more as time went on. Many of these act like they will be your one and only place. But that's obviously not true. They have different functions, different communities, different affordances... Used to be you could share things from your Google Reader account but that wasn't the same.

What FriendFeed did is to bring all of these feeds in to one place, with a little snippet or picture, and let you comment and reshare and like. You could share something right there, but you didn't have to. It would try to group things if you had your blog posting directly and your Twitter stream duplicated that. You could see what your friends liked and find new and interesting people that way. For the first few years I was on there I was only going to follow library people, well, and of course Heather, and Cameron, and Neil, and Egon, and ... but I was glad I did get to enjoy and eventually follow some really neat people.

If someone posted something you didn't want to see, you could hide just that post, or you could hide things they shared via a particular feed. You could block someone completely so you wouldn't have to see their comments.

I've played with a lot of other tools, but FriendFeed just worked for me.  It was a great source of recipes, if nothing else!

There was a team of savvy folks archiving as much as they could. So far, the best way to see what it was like is to see Micah Wittman's FriendFeedmemorial.com . That's really pretty cool.

So where is LSW now? We're trying Discourse at thelsw.org (doesn't allow you to bring feeds in but you can get a cod badge). We're also trying http://www.frenf.it which is really, really cool... but we don't know how sustainable. And we followed each other on Twitter... but it's not the same.

I miss it already!

No responses yet

Another dissertation on science blogs

Any readers interested in my work (and you'd probably have to be following me for a while to even know what that is), will probably be interested in that of Paige Brown Jarreau. She's a PhD Candidate at LSU and is defending any day now. She did a massive set of interviews and a survey and has shared some of her results on FigShare, on her blog, and in her Twitter stream. So far we've mostly had a glimpse of her findings - can't wait to see the rest of her dissertation (good grief the rate I'm going I guess I'll get a chance to cite it in mine :) )

No responses yet

Using R TwitteR to Get User Information

I'm gonna keep stating the obvious, because this took me a few hours to figure out. Maybe not working continuously, but still.

So, I have like more than 6000 tweets from one year of AGU alone, so I'm gonna have to sample somehow. Talking this over with my advisor, he suggested that we have to find some reasonable way to stratify and then do random within the stratification. I haven't worked all the details out yet - or really any of them - but I started gathering user features I could base the decision on. Number of tweets with the hashtag was super quick in Excel. But I was wondering if they were new to Twitter, if they tweeted a lot, and if they had a lot of followers. That's all available through the api and using the TwitteR package by Jeff Gentry.  Cool.

So getUser() is the function to use. I made up a list of the unique usernames in Excel and imported that in. Then I went to loop through.


library("twitteR", lib.loc="C:/Users/Christina/Documents/R/win-library/3.0")
#get the data
 data USERdata<-vector()
userInfo<-function(USER){
 temp<-getUser(USER, cainfo="cacert.pem")
 USERdata<-c(USER,temp$created,temp$statusesCount,temp$followersCount)
 return(USERdata)
 }
 #test for users 4-6
 tweeter.info<-sapply(data$user[4:6],userInfo)

But that was sorta sideways... I had a column for each user... sorta weird. Bob O'H helped me figure out how to transpose that and I did, but it was sorta weird.

So then I tried this way:

get.USER.info<-function(startno,stopno){
# set up the vectors first
n<-stopno-startno+1
USER.name<-character(n)
USER.created<-numeric(n)
USER.posts<-numeric(n)
USER.foll<-numeric(n)
for (i in startno:stopno) {
thing<-getUser(data$user[i], cainfo="cacert.pem")
USER.name[i]<-data$user[i]
USER.created[i]<-thing$created
USER.posts[i]<-thing$statusesCount
USER.foll[i]<-thing$followersCount
}


return(data.frame(username=USER.name,created=USER.created, posts=USER.posts,followers=USER.foll, stringsAsFactors=FALSE))
}

So that was cool, until it wasn't. I mean, turns out that 2% of the users have deleted their accounts or block me or are private or something. So it didn't recover from that error and I tried to test for is.null() and is.NA() but it failed....
So then I went back to the mailing list and there was a suggestion to user try() but eek.
So then I noticed that if you have a pile to look through you're actually supposed to use
lookupUsers(users, includeNA=FALSE, ...)
And I did, and I wanted to keep the NA so that I could align with my other data later... but once again, no way to get the NAs out. And it's an object that's a pile of lists... which I was having trouble wrapping my little mind around (others have no issues).
So I went back and used that command again, and this time said to skip the NA (the not found users). Then I think from the mailing list or maybe from Stack Overflow? I had gotten the idea to use unlist. So here's what I did then:
easy.tweeters.noNA<-lookupUsers(data$user, cainfo="cacert.pem")
#check how many fewer this was
length(easy.tweeters.noNA)
#1247 so there were 29 accounts missing hrm
testbigdf<-data.frame()
for (i in 1:1247){holddf<-twListToDF(easy.tweeters.noNA[i])
testbigdf<-rbind(testbigdf,holddf)
}

And that created a lovely dataframe with all kinds of goodies for it. I guess I'll have to see what I want to do about the 29 accounts.

I really would have been happier if it was more graceful with users that weren't found.

Also, not for every single command you have to user the cainfo="cacert.pem" thingy... Every time, every command.

ALSO, I had figured out oauth, but the twitter address went from http:// to https:// and so that was broken, but I fixed it. I hope I don't have to reboot my computer soon! (Yeah, I saved my credentials to a file, but I don't know... )

Comments are off for this post

Using R to expand Twitter URLS

So this should be so simple and obvious that it's not worth a post, but I keep forgetting how to do everything so I'm gonna put this here to remind myself.

Here's where I am. I have a list of 4011 tweets with #agu12 or #agu2012 hashtag. A lot of these are coding as "pointers" - their main function is to direct readers' attention somewhere else. So I got to wondering: where?  Are they directing people to online versions of the posters? Are they just linking to more NASA press releases?  % going to a .edu?

Of course all the URLs are shortened and there are services you can use to expand, but in R, it's already right there in the TwitteR package as

decode_short_url

This uses the longapi.org API . All you have to do is plug in the URL. Cool!

So here was my original plan: find the tweets with urls, extract the urls, expand them, profit! And I was going to do all this in R. But then it got a little ridiculous.
So instead I: used open refine to find all the urls, then assigned IDs to all the records, and then used filtering and copy and pasting to get them all in two columns ID, URL.

Issues: non-printing characters (Excel has a clean command), extra spaces (trim - didn't really work so I did a find and replace), random commas (some needed to be there), random other punctuation (find and replace), #sign

The idea in R was to do a for loop to iterate through each url, expand it, append it to a vector (or concatenate, whichever), then add that to the dataframe and do stats on it or maybe just export to Excel and monkey with it there.

For loop, fine, append - not for love nor money despite the fact that I have proof that I successfully did it in my Coursera class. I don't know. And the API was failing for certain rows. For a couple of rows, I found more punctuation. Then I found the rest of the issues were really all about length. They don't expect shortened urls to be long (duh)!  So then I had to pick a length, and only send ones shorter than that (50) to the api. I finally gave up with the stupid append, and I just printed them to the screen and copied them over to Excel. Also I cheated with how long the for loop had to be - I should have been able to just say the number of rows in the frame but meh.
Anyhow, this worked:

 setwd("~/ mine")
library("twitteR", lib.loc="C:/Users/Christina/Documents/R/win-library/3.0")
#get the data
data <- read.csv("agu12justurl.csv", colClasses = "character")
#check it out
head(data)
str(data)
#test a single one
decode_short_url(data$url[2])
#this was for me trying to append, sigh
full.vec <- vector(mode="character")
#create a vector to put the new stuff in, then I'll append to the data frame, I hope
#check the for loop 
 for (i in 1:200){print(data.sub$url[i])}
#that works
for (i in 1:3){print(decode_short_url(data.sub$url[i]))}
#that works - good to know, though, that if it can't be expanded it comes back null

#appending to the vector is not working, but printing is so will run with that 
for (i in 1:1502){ if(nchar(data$url[i])>50){
 urlhold<-data$url[i]
 } else {
 urlhold<-decode_short_url(data$url[i])
 }
 print(urlhold)
 #append(full.vec,urlhold)
 }

If anyone wants to tell me what I'm doing wrong with the append, it would be appreciated. I'm sure it must be obvious.

So what's the answer? Not sure. I'll probably do a post on string splitting and counting... OR I'll be back in Open Refine. How do people only ever work in one tool?

3 responses so far

Keeping up with a busy conference - my tools aren't doing it

I wrote about trying to use TwitteR to download AGU13 tweets. I'm getting fewer and fewer with my calls. I was very excited to try Webometric Analyst from Wolverhampton and described by Kim Holmberg in his ASIST webinar (BIG pptx, BIG wmv).

One of the things Webometric Analyst will do is do repeated searches until you tell it to stop. This was very exciting. But I tried it and alas, I think Twitter thinks I'm abusive or something because it was way throttled. Like I could see the tweets flying up on the screen at twitter.com but the search was retrieving like 6. I ran the R search mid-day today and got 99 tweets back which covered  5 minutes O_o. I asked for up to 2000, from the whole day, and had it set to retry if stopped.

Sigh.

Comments are off for this post

The #agu12 and #agu2012 Twitter archive

I showed a graph of the agu10 archive here, and more recently the agu11/2011 archive here, and now for the agu12/2012 archive. See the 2011 post for the exact methods used to get the data and to clean it.

#agu12 and #agu2012 largest component, nodes sized by degree

#agu12 and #agu2012 largest component, nodes sized by degree

agu12 and 2012 other components no iso sized by degree n1294

#agu12 and #agu2012 other components, no isolates, nodes sized by degree

I will have to review methods to show this, but from appearances, the networks are becoming more like hairballs. In the first year, half the people were connected to theAGU and the other half were connected to NASA, but very few were connected to both. The other prominent nodes were pretty much all institutional accounts. In 2011, that started to decrease and now in 2012 you can't really see that division at all. There are the top three nodes - two the same plus a NASA robotic mission - but then there's a large second group with degrees (connections to others) around 40-80 (combined indegree and outdegree) of individual scientists.

2 responses so far

An image of the #agu2011, #agu11 Twitter archive

A loooong time ago, I  showed the agu10 archive as a graph, here's the same for the combination of agu11 and agu2011. I mentioned already about the upper/lower case issues (excel is oblivious but my graphing program cares) - this is all lower case (I first tried to correct but kept missing things so I just used Excel's =LOWER()). I also discussed how I got the data. I'm going to have to probably go back and do this for 2010 if I really want equivalent images because 1) I only kept the first @ (this has all the @) 2) I don't believe I did both 2010 and 10 so I probably missed some. For this image I did a little bit of correcting. One twitter name spelled wrong and quite a few people using the_agu or agu instead of theagu. I also took out things that were like @10am or @ the convention center.

I made this graph by taking my excel spreadsheet that was nicely username first@ second@ .... and copying that into Ucinet's dl editor and saving as nodelist1. Then I visualized and did basic analysis in NetDraw.

agu2011 and agu11 largest component, sized by degree

agu2011 and agu11 largest component, sized by degree

The largest component is 559 nodes of 740 and this time you don't see that breakdown where the people who tweeted @NASA didn't tweet @ theAGU. There were 119 isolates and other components with 2,3, and 10 nodes:

Other components, sized by degree (no isolates)

Other components, sized by degree (no isolates)

eta: oh yeah, one other little fix. I took out random punctuation at the end of user names like hi @cpikas! or hey @cpikas: or  well you get the idea

Comments are off for this post

New, now scientists can use blogs to talk to other scientists about science!

I collect articles on scientists using blogs and twitter. Mostly because it’s relevant to my dissertation, but also because I find them interesting. You can see a listing here: http://www.delicious.com/cpikas/meta_science_blogging (used to be displayed on my UM page, but that broke in the transition).

So one of these articles that I saw tweeted by about five people at the same time is Wolinsky, H. (2011). More than a blog. EMBO reports 12, 1102 - 1105. doi:10.1038/embor.2011.201 .

Of course it starts with the arsenic life discussion. It talks about the immediacy of the blog reaction and the tone of the discussion on the blogs.  Overall a nice article.

I think the subtitle of the piece is unfair. It acts like the title of this post when the article itself is more about where blogs have evolved to right now. There are a lot of differing experiences with blogs and differing uses, some of which have always been talking shop.

4 responses so far

Solution to my Twitter API - twitterR issues

With lots of help from Bob O’Hara (thank you!), I was able to solve my problems. I am looking at the tweets around #AGU10 but it occurred to me that I wanted to know what other tweets the AGU twitterers were sending while at the meeting because some might not have had the hashtag.

Here goes:

# Get the timeline
person <- userTimeline("person",n=500)

# Check to see how many you got
length(person)

# Check to see if that is far enough back
person[[500]]$getCreated()

# Get the time it was tweeted
Time = sapply(person,function(lst) lst$getCreated() )

# Get screen name
SN = sapply(person,function(lst) lst$getScreenName() )

# Get any reply to screen names
Rep2SN = sapply(person,function(lst) lst$getReplyToSN())

# Get the text
Text = sapply(person,function(lst) lst$getText())

# fix the date from number of seconds to a human readable format
TimeN <- as.POSIXct(Time,origin="1970-01-01", tz="UTC")

# replace the blanks with NA
Rep2SN.na <- sapply(Rep2SN, function(str) ifelse(length(str)==0, NA, str))

# make it into a matrix
Data.person <- data.frame(TimeN=TimeN, SN=SN, Rep2SN.na=Rep2SN.na, Text=Text)

# save it out to csv
write.csv(Data.person, file="person.csv")

 

So I did this by finding and replacing person with the screen name in a text editor and pasting that into the script window in Rcmdr. I found that 500 was rarely enough. Some I had to request up to 3200 tweets, which is the maximum. I had to skip one person because 3200 didn’t get me back to December. It’s also worth noting the length() step. It turns out that when you ask for 500 you sometimes get 550 and sometimes get 450 or anywhere in between and it’s not because there aren’t any more. You may also wonder why I wrote the whole thing out to a csv file. I could have had a step to cut out the more recent and older tweets to have just the set there for more operations within R. I need to actually do qualitative content analysis on the tweets and I plan to do that in NVIVO9.

I didn’t do this for all 860, either. I did it for the 30 or so who tweeted 15 or more times with the hashtag. I might expand that to 10 or more (17 more people). Also, I didn’t keep the organizational accounts (like theAGU).

With that said, it’s very tempting to paste all of these data frames together, remove the text and do the social network analysis using iGraph. Even cooler would be to show an automated display of how the social network changes over time. Are there new connections formed at the meeting (I hope so)? Do the connections formed at the meeting continue afterward? If I succumb to the temptation, I’ll let you know. There’s also the the textmining package and plugin for Rcmdr. This post gives an idea of what can be done with that.

2 responses so far

My ongoing struggle with the Twitter API, R, … copy paste

I’m posting this in hopes that someone with experience in any/all of the above or maybe Perl, can point out that I’m doing something stupid or have overlooked something obvious. If nothing else, you might read this to see what not to try.

Here’s the issue: it’s totally obvious that I need to look at the other tweets that were sent by #agu10 tweeters (the ones not marked with the hash tag) if I want to understand how Twitter was used at the meeting. But it’s now five months later and there are 860 of them (although I would be fine with looking at the most prolific non-institutional tweeters).

I first looked at the Twitter API and I tried just adding terms to URLs and got the recent timelines for a user at a time but I couldn’t see a way to get a user’s timeline for a set period of time (the conference time period +a week on each end, or so).

I asked two experts and they both said that you couldn’t combine the user timeline with a time period.

Darn. So my next idea was to see if I could actually access someone’s timeline that far back through the regular interface. I tried one of the more prolific tweeters and I could. Ok, so if I can pull down all of their tweets, then I could pick out the ones I wanted. Or, even better, I could also look at the evolution of the social network over time. Did people meet at the meeting and then continue to tweet at each other or are these people only connected during the actual meeting?  Did the network exist in the same way before the meeting?

I was looking for ways to automate this a bit and I noticed that there were things already built for Perl and for R. I used Perl with a lot of handholding to get the commenter network for an earlier paper and I used R for both that same article and in lieu of STATA for my second semester of stats. I’m not completely comfortable with either one and I don’t always find the help helpful. I decided to start with R.

The main package is twitteR by Jeff Gentry. I updated my installation of R and installed and loaded that package and the dependencies. First thing I did was to get my own standard timeline:

testtweets <- userTimeline("cpikas")

Then I typed out the first few to see what I got (like when you’re using DIALOG)

testtweets[1:5]

And I saw my tweets in the format:

[[1]]

[1]”username: text”

I checked the length of that and got 18 – the current timeline was 18 items. I tried the same thing substituting user id but that didn’t work. So then I tried to retrieve 500 items and that worked fine, too.

testlonger <- userTimeline ("cpikas", n=500)

Great. Now, let me see the dates so I can cut off the ones I want. Hm. Ok, let’s see, how to get the other columns. What type of object is this anyhow? The manual is no help. I tried some things with object$field. No joy. Tried to edit. no joy – it was upset about the < in the image url. And it was also telling me that the object was of type S4. The manual said it wasn’t but I can’t argue if that’s what it’s reading. I somehow figured out it was a list. I tried object name [[1]][2]  - null. Then I eventually tried

str(testweets)

Hrumph. It says 1 slot. So as far as i can tell, it’s a deprecated object type and it didn’t retrieve or keep all of the other information needed to narrow by date.

When googling around, I ran across this entry by Heuristic Andrew on text mining twitter data with R. I didn’t try his method with the xml package yet (may try that). I did try the package that was listed in the comments tm.plugin.webcorpus by Mario Annau. That does get the whole tweet and put the things in slots the right way (object$author), but it looks like you can only do a word search. Oh wait, this just worked:

testTM <- getMeta.twitter('from:cpikas')

But that’s supposed to default to 100 per page, 100 things returned and it only returned 7 for me. I guess the next thing to try is the XML version unless someone reading this has a better idea?

edit: forgot the copy paste. When I tried to just look at the tweets i wanted on screen and then copy them into a text document it crashed firefox. who knows why

18 responses so far

Older posts »