Using R TwitteR to Get User Information

I'm gonna keep stating the obvious, because this took me a few hours to figure out. Maybe not working continuously, but still.

So, I have like more than 6000 tweets from one year of AGU alone, so I'm gonna have to sample somehow. Talking this over with my advisor, he suggested that we have to find some reasonable way to stratify and then do random within the stratification. I haven't worked all the details out yet - or really any of them - but I started gathering user features I could base the decision on. Number of tweets with the hashtag was super quick in Excel. But I was wondering if they were new to Twitter, if they tweeted a lot, and if they had a lot of followers. That's all available through the api and using the TwitteR package by Jeff Gentry.  Cool.

So getUser() is the function to use. I made up a list of the unique usernames in Excel and imported that in. Then I went to loop through.


library("twitteR", lib.loc="C:/Users/Christina/Documents/R/win-library/3.0")
#get the data
 data USERdata<-vector()
userInfo<-function(USER){
 temp<-getUser(USER, cainfo="cacert.pem")
 USERdata<-c(USER,temp$created,temp$statusesCount,temp$followersCount)
 return(USERdata)
 }
 #test for users 4-6
 tweeter.info<-sapply(data$user[4:6],userInfo)

But that was sorta sideways... I had a column for each user... sorta weird. Bob O'H helped me figure out how to transpose that and I did, but it was sorta weird.

So then I tried this way:

get.USER.info<-function(startno,stopno){
# set up the vectors first
n<-stopno-startno+1
USER.name<-character(n)
USER.created<-numeric(n)
USER.posts<-numeric(n)
USER.foll<-numeric(n)
for (i in startno:stopno) {
thing<-getUser(data$user[i], cainfo="cacert.pem")
USER.name[i]<-data$user[i]
USER.created[i]<-thing$created
USER.posts[i]<-thing$statusesCount
USER.foll[i]<-thing$followersCount
}


return(data.frame(username=USER.name,created=USER.created, posts=USER.posts,followers=USER.foll, stringsAsFactors=FALSE))
}

So that was cool, until it wasn't. I mean, turns out that 2% of the users have deleted their accounts or block me or are private or something. So it didn't recover from that error and I tried to test for is.null() and is.NA() but it failed....
So then I went back to the mailing list and there was a suggestion to user try() but eek.
So then I noticed that if you have a pile to look through you're actually supposed to use
lookupUsers(users, includeNA=FALSE, ...)
And I did, and I wanted to keep the NA so that I could align with my other data later... but once again, no way to get the NAs out. And it's an object that's a pile of lists... which I was having trouble wrapping my little mind around (others have no issues).
So I went back and used that command again, and this time said to skip the NA (the not found users). Then I think from the mailing list or maybe from Stack Overflow? I had gotten the idea to use unlist. So here's what I did then:
easy.tweeters.noNA<-lookupUsers(data$user, cainfo="cacert.pem")
#check how many fewer this was
length(easy.tweeters.noNA)
#1247 so there were 29 accounts missing hrm
testbigdf<-data.frame()
for (i in 1:1247){holddf<-twListToDF(easy.tweeters.noNA[i])
testbigdf<-rbind(testbigdf,holddf)
}

And that created a lovely dataframe with all kinds of goodies for it. I guess I'll have to see what I want to do about the 29 accounts.

I really would have been happier if it was more graceful with users that weren't found.

Also, not for every single command you have to user the cainfo="cacert.pem" thingy... Every time, every command.

ALSO, I had figured out oauth, but the twitter address went from http:// to https:// and so that was broken, but I fixed it. I hope I don't have to reboot my computer soon! (Yeah, I saved my credentials to a file, but I don't know... )

Tags: ,

Comments are off for this post