Archive for the 'dissertation' category

The Dissertation is Now Available for Your Enjoyment

Jun 30 2016 Published by under dissertation, my doctoral program

Or, you know, for bedtime reading.

Christina K. Pikas, PhD, 2016
This dissertation presents a literature-based framework for communication in science (with the elements partners, purposes, message, and channel), which it then applies in and amends through an empirical study of how geoscientists use two social computing technologies (SCTs), blogging and Twitter (both general use and tweeting from conferences). How are these technologies used and what value do scientists derive from them?
Method The empirical part used a two-pronged qualitative study, using (1) purposive samples of ~400 blog posts and ~1000 tweets and (2) a purposive sample of 8 geoscientist interviews. Blog posts, tweets, and interviews were coded using the framework, adding new codes as needed. The results were aggregated into 8 geoscientist case studies, and general patterns were derived through cross-case analysis.
Results A detailed picture of how geoscientists use blogs and Twitter emerged, including a number of new functions not served by traditional channels. Some highlights: Geoscientists use SCTs for communication among themselves as well as with the public. Blogs serve persuasion and personal knowledge management; Twitter often amplifies the signal of traditional communications such as journal articles. Blogs include tutorials for peers, reviews of basic science concepts, and book reviews. Twitter includes links to readings, requests for assistance, and discussions of politics and religion. Twitter at conferences provides live coverage of sessions. Conclusions Both blogs and Twitter are routine parts of scientists' communication toolbox, blogs for in-depth, well-prepared essays, Twitter for faster and broader interactions. Both have important roles in supporting community building, mentoring, and learning and teaching. The Framework of Communication in Science was a useful tool in studying these two SCTs in this domain. The results should encourage science administrators to facilitate SCT use of scientists in their organization and information providers to search SCT documents as an important source of information.

One response so far

Defense slides

Took me a bit - I forgot to upload them to SlideShare until just now. I did pass with revisions to be approved by my advisor.

I have to tell you that it was really anticlimactic. I thought it would be a big weight off my shoulders and I would feel free and I would have minor quibbles but lots of pats on the back... but... well... I don't know.  This massive framework o' mine? The communications prof thought it was exactly the same as Shannon and Weaver (1948). Wow.

At least when I do these edits I can get on with writing up other work I've done and then prepping pieces of this for publication. So, really, no less work, but different.

I do fully intend to make this freely available with creative commons attribution and all that. The whole dissertation. I am going to do the revisions first, though, because some are pretty big.

2 responses so far

"Theory" for the immigrant to social sciences

Oct 11 2015 Published by under dissertation

This really deserves a detailed and thorough treatment it won't get here.

The entire point of my dissertation is basically that it's important to integrate across the various diverse literatures that have looked at how scientists communicate in order to adequately understand how any new technologies might be used or be useful (valued/valuable, etc).

Sound impossible much? Kinda sorta, but hey, maybe that's why it's taking me 10 years!

My undergrad is in physics. Masters is in library science (MLS of course). Neither do theory the way, say, sociologists, linguists, communications researchers, or anyone else in the social sciences does. At least at the undergraduate level, you don't have to pick an epistemology and a particular theory when measuring gravity or the wavelength of a laser.

When I look at theories, I basically look at what evidence was used to develop them and what explanatory power they have.  Also, I'm very pragmatic and I don't especially adhere dogmatically to any one epistemology.

Working this way gets me into trouble when trying to communicate with someone who is in one of these fields. You're really supposed to pick a viewpoint and use theory as a lens. I do get testing theories. I know how to do that, but there is supposed to be more. I don't know how to own and live and practice a theory.

I'm not terribly convinced others in LIS do, either, despite books on our "theories" and numerous ASIST sessions.

Maybe this is horrible admission? Maybe I have to pick one should I ever go on the academic market (which I don't expect to if any of my colleagues are reading this but really, I might if my committee is reading this)? Maybe I will not be able to get my articles in to communications journals?

It may be a completely different situation, but Paige Jarreau shared similar feedback on twitter:


Interestingly, when I was looking for the exact tweets to cite, I found her request for an article on theorizing social media.* The author basically complains just the opposite: we're trying to use everyone's old theories and just make them work for social media, even if we have to ignore things like interactivity.

Of course, the STS folks would call me cray-cray because incommensurability and what-not. So maybe it's up to us to have our own theory, then publish a lot, and then we'll be all set 🙂

*Kent, M. L. (2015). Social Media Circa 2035: Directions in Social Media Theory Atlantic Journal of Communication, 23, 1-4. doi:10.1080/15456870.2015.972407

5 responses so far

CMC stuff I'm reading

The computer mediated communication field of research is of course very important to my dissertation, but it's so vast that it's been difficult to know where to look beyond the assigned references in my doctoral seminar (see - this is why they need the big high powered seminars).

Many of these are self-archived online so no barriers!  If I get a chance, I may come back and summarize these like I did for my comps readings. I still find those helpful.
Kiesler, S. B., Siegel, J., & McGuire, T. W. (1984). Social psychological aspects of computer-mediated communication. American Psychologist, 39(10), 1123-1134.
Describes some of the issues raised by electronic communication, including time and information-processing pressures, absence of regulating feedback, dramaturgical weakness, paucity of status and position cues, social anonymity, and computing norms and immature etiquette. An empirical approach for investigating the social psychological effects of electronic communication is illustrated, and how social psychological research might contribute to a deeper understanding of computers and technological change in society and computer-mediated communication (CMC) is discussed. A series of studies that explored how people participate in CMC and how computerization affects group efforts to reach consensus is described; results indicate differences in participation, decisions, and interaction among groups meeting face to face and in simultaneous computer-linked discourse and communication by electronic mail. Findings are attributed to difficulties of coordination from lack of informational feedback, absence of social influence cues for controlling discussion, and depersonalization from lack of nonverbal involvement and absence of norms.

Litt, E. (2012). Knock, Knock. Who's There? The Imagined Audience. Journal of Broadcasting & Electronic Media, 56(3), 330-345. doi:10.1080/08838151.2012.705195
For more than a century, scholars have alluded to the notion of an ?imagined audience??a person's mental conceptualization of the people with whom he or she is communicating. The imagined audience has long guided our thoughts and actions during everyday writing and speaking. However, in today's world of social media where users must navigate through highly public spaces with potentially large and invisible audiences, scholars have begun to ask: Who do people envision as their public or audience as they perform in these spaces? This article contributes to the literature by providing a theoretical framework that broadly defines the construct; identifies its significance in contemporary society and the existing tensions between the imagined and actual audiences; and drawing on Giddens's concept of structuration, theorizes what influences variations in people's imagined audience compositions. It concludes with a research agenda highlighting essential areas of inquiry.;

Sproull, L., & Kiesler, S. (1986). REDUCING SOCIAL CONTEXT CUES: ELECTRONIC MAIL IN ORGANIZATIONAL COMMUNICATION. Management Science, 32(11), 1492-1512.
This paper examines electronic mail in organizational communication. Based on ideas about how social context cues within a communication setting affect information exchange, it argues that electronic mail does not simply speed up the exchange of information but leads to the exchange of new information as well. Consistent with experimental studies, the authors found that decreasing social context cues has substantial deregulating effects on communication. And they also found that much of the information conveyed through electronic mail was information that would not have been conveyed through another medium.

[these two older ones are really to show the transformation from - people who are new to computers at all trying to use all the techniques they had in face-to-face communication - to heavy users who have developed affordances that enable them to have rich communication in media that don't have the traditional social cues]

**Treem, J. W., & Leonardi, P. M. (2012). Social media use in organizations: Exploring the affordances of visibility, editability, persistence, and association. Communication Yearbook, 36, 143-189.

[very useful as a reference for what kinds of things collaboration software should have,too]

Walther, J. B. (1992). Interpersonal Effects in Computer-Mediated Interaction: A Relational Perspective. Communication Research, 19(1), 52-90. doi:10.1177/009365092019001003
Several theories and much experimental research on relational tone in computer-mediated communication (CMC) points to the lack of nonverbal cues in this channel as a cause of impersonal and task-oriented messages. Field research in CMC often reports more positive relational behavior. This article examines the assumptions, methods, and findings of such research and suggests that negative relational effects are confined to narrow situational boundary conditions. Alternatively, it is suggested that communicators develop individuating impressions of others through accumulated CMC messages. Based upon these impressions, users may develop relationships and express multidimensional relational messages through verbal or textual cues. Predictions regarding these processes are suggested, and future research incorporating these points is urged.

Walther, J. B. (1996). Computer-mediated communication impersonal, interpersonal, and hyperpersonal interaction. Communication Research, 23(1), 3-43. doi:10.1177/009365096023001001
While computer-mediated communication use and research are proliferating rapidly, findings offer contrasting images regarding the interpersonal character of this technology. Research trends over the history of these media are reviewed with observations across trends suggested so as to provide integrative principles with which to apply media to different circumstances. First, the notion that the media reduce personal influences—their impersonal effects—is reviewed. Newer theories and research are noted explaining normative “interpersonal” uses of the media. From this vantage point, recognizing that impersonal communication is sometimes advantageous, strategies for the intentional depersonalization of media use are inferred, with implications for Group Decision Support Systems effects. Additionally, recognizing that media sometimes facilitate communication that surpasses normal interpersonal levels, a new perspective on “hyperpersonal” communication is introduced. Subprocesses are discussed pertaining to receivers, senders, channels, and feedback elements in computer-mediated communication that may enhance impressions and interpersonal relations.

**Walther, J. B. (2011). Theories of Computer-Mediated Communication and Interpersonal Relations. In M. L. Knapp, & J. A. Daly (Eds.), The Sage handbook of interpersonal communication (4th ed., pp. 443-479). Thousand Oaks, CA: Sage. Retrieved from

This is a good overview of the missing social cues and other CMC research over the past 30 years. He does, however, like his own theories and research best and others may not agree 🙂

One response so far

Straddling the fan-girl critical thinker divide, while trying not to be not-even-wrong

Apr 12 2015 Published by under dissertation

Working on the tougher bits of my dissertation now (defense is really scheduled, finally), and trying to come to terms with my relationship with the science blogosphere and twitterverse (or whatever). Some other articles - and one was particularly cringeworthy - on the topic have been in the not-even-wrong category. It's like someone trying to explain your culture to you and just getting it wrong (like my old boss who kept insisting I was Orthodox even though I told her a million times I'm Catholic, just Eastern Rite/Ukrainian).

Am I in a privileged position here on Scientopia? To have attended Science Online for several years? To have met and chatted with many science bloggers? Thought deeply about science blogging since about 2004?

Am I just a fan girl who gushes about the wonders of blogging to anyone who will listen? Despite being told that it's dead? (at least people have finally stopped telling me wikis will take over. siiiiiigh). Am I uncritical in my support?

If I am in a privileged position as a long time (peripheral?) participant observer, how do I convey that? These other articles - I can often see how they got to the results and interpretations they did, but meh.  Maybe I'm fooling myself, too, but in a different way? they are published, I am not.

So looking at definitions of prolonged engagement and persistent observation and well, damn, I had better go to bed as a) Easter tomorrow and b) twin 3 year olds get up when they want to.

Comments are off for this post

So... um... what if I'm still enjoying it?

Feb 05 2015 Published by under dissertation

Am I supposed to kind of hate my dissertation topic by now? If I don't, does that mean I'm not working on it hard enough (maybe)? I'm doing it wrong? Maybe it's a phase and it will pass.

Making progress. Learning new stuff from my data. Feeling horribly inadequate when watching tweets fly by from another doctoral student dissertating on how scientists use blogs.... (holy moly how many scientists did she actually interview? hundreds? cray-cray... or am I a hater?)

Working on it every chance I get - taking a morning off every week. Staying up late. .. I will have to add more time off. If only we could afford more childcare!

2 responses so far

Trying another CAQDAS, MaxQDA

Jul 12 2014 Published by under dissertation, information analysis

CAQDAS: Computer Assisted Qualitative Data Analysis Software (more).

Previously, I'd used NVIVO and I found it to be miserable and horribly expensive (I didn't pay, but I would have to this time). I really did most of my work offline with colored pencils, note cards, etc. I am fully willing to admit that I didn't know how to use it or that maybe I was doing it wrong, but instead of saving me time, it was costing.

I started trying to use it with my dissertation data and ew. It's fine with my interview transcripts, but tweets and blog posts, just yuck. So then I started coding just using Excel and meh.

So back to the drawing board. I read a bunch of reviews of different products on Surrey's pages: , but it's really hard to tell. I also started looking at prices and oh la la!  I was thinking maybe dedoose, despite their earlier issues, but that's like at least $10/month. Not bad until you think you might do this for a while.

After all that MaxQDA - a German product - seemed like a slightly better choice than some. The student license doesn't expire and is for the full product (they have a semester one, but that doesn't make sense for me) - that's probably the biggest selling point.

So far so good. Importing about a thousand blog posts as individual documents was no big deal. Interview transcripts were even quicker. Adding in my codes was super quick and it was super quick to move one when I stuck it in the wrong hierarchy. I think I'll work with this data for a bit before doing the twitter archives - particularly since I don't know how I might sample.

I'm still on the 30-day trial. Not sure if I need to try to start paying for it with a week or so to spare so the verification can be completed. My university doesn't have an expiration date on our IDs. Not sure if my advisor has to send an e-mail or what.

Yeah, there's something for R (of course), but it doesn't have the features. I was still thinking I might do some more machine learning and other tricks with my data using R which is easy now that it's all in spreadsheets.

One response so far

Quick note: Now on GitHub

Jul 12 2014 Published by under dissertation, information analysis

Scripts mentioned previously are now on GitHub with an MIT license which should hopefully be permissive enough. I can't say that anyone would want to use these, but this also backs up my code even better which is useful.

I'm using RStudio so probably if/when I do more analysis in R, I'll just use Git from there.

The url is, unsurprisingly:

Comments are off for this post

Getting blog post content in a usable format for content analysis

Jul 04 2014 Published by under dissertation

So in the past I've just gone to the blogs and saved down posts in whatever format to code (as in analyze). One participant with lots of equations asked me to use screenshots if my method didn't show them adequately.

This time I have just a few blogs of interest and I want to go all the way back, and I'll probably do some quantitative stuff as well as just coding at the post level. For example just indicating if the post discusses their own work, other scholarly work, a method (like this post!), a book review... , career advice, whatever. Maybe I'll also select some to go deeper but it isn't content analysis like linguists or others do at the word level.

Anyway, lots of ways to get the text of web pages. I wanted to do it in R completely, and I ended up getting the content there, but I found python to work much better for parsing the actual text out of the yucky tags and scripts galore.

I had *a lot* of help with this. I lived on StackOverflow, got some suggestions at work and on friendfeed (thanks Micah!), and got a question answered on StackOverflow (thanks alecxe!). I tried some books in Safari but meh?

I've had success with this on blogger and wordpress blogs. Last time when I was customizing a perl script to pull the commenter urls out every blog was so different from the others that I had to do all sorts of customization. These methods require very little change from one to the next. Plus I'm working on local copies when I'm doing the parsing so hopefully having as little impact as possible (now that I know what I'm doing - I actually got myself blocked from my own blog earlier because I sent so many requests with no user agent)

So using R to get the content of the archive pages. Largest reasonable archive pages possible instead of pulling each post individually, which was my original thought. One blog seemed to be doing an infinite scroll but when you actually looked at the address block it was still doing the blogurl/page/number format.  I made a csv file with the archive page urls in one column and the file name in another. I just filled down for these when they were of the format I just mentioned.

Read them into R. Then had the function:
 UserAgent <- "pick something"
 temp <- getURL(link, timeout = 8, ssl.verifypeer = FALSE, useragent = "UserAgent")
 nameout <- paste(fileName, ".htm", sep="") 
 write (temp,file=nameout)

I ended up doing it in chunks.  if you're doing this function with one it's like:

getFullContent("","archivep1" )

More often I did a few:


So I moved the things around to put them in a folder.

Then this is the big help I got from StackOverflow. Here's how I ended up with a spreadsheet.

from bs4 import BeautifulSoup
import os, os.path

# from
# this is the file to write out to
posts_file = open ("haposts.txt","w")

def pullcontent(filename):

    soup = BeautifulSoup(open(filename))
    posts = []
    for post in soup.find_all('div', class_='post'):
        title = post.find('h3', class_='post-title').text.strip()
        author = post.find('span', class_='post-author').text.replace('Posted by', '').strip()
        content = post.find('div', class_='post-body').p.text.strip()
        date = post.find_previous_sibling('h2', class_='date-header').text.strip()

        posts.append({'title': title,
             'author': author,
             'content': content,
             'date': date})
    #print posts
    posts = str(posts)
    posts_file.write (posts)

# this is from

for filename in os.listdir("files"):

print ("All done!")


So then I pasted it into word, put in some line breaks and tabs and pasted into excel.  I think I could probably go from that file or the data directly into Excel, but this works.

Really very minor tweaking between blogs. Most I don't actually need an author for but I added in the url using something like this:

url = post.find('h2').a.get('href')

The plan is to import this into something like nvivo or atlas.ti for the analysis. Of course it would be very easy to load it in to R as a corpus and then do various textmining operations.

Comments are off for this post

Using R TwitteR to Get User Information

I'm gonna keep stating the obvious, because this took me a few hours to figure out. Maybe not working continuously, but still.

So, I have like more than 6000 tweets from one year of AGU alone, so I'm gonna have to sample somehow. Talking this over with my advisor, he suggested that we have to find some reasonable way to stratify and then do random within the stratification. I haven't worked all the details out yet - or really any of them - but I started gathering user features I could base the decision on. Number of tweets with the hashtag was super quick in Excel. But I was wondering if they were new to Twitter, if they tweeted a lot, and if they had a lot of followers. That's all available through the api and using the TwitteR package by Jeff Gentry.  Cool.

So getUser() is the function to use. I made up a list of the unique usernames in Excel and imported that in. Then I went to loop through.

library("twitteR", lib.loc="C:/Users/Christina/Documents/R/win-library/3.0")
#get the data
 data USERdata<-vector()
 temp<-getUser(USER, cainfo="cacert.pem")
 #test for users 4-6<-sapply(data$user[4:6],userInfo)

But that was sorta sideways... I had a column for each user... sorta weird. Bob O'H helped me figure out how to transpose that and I did, but it was sorta weird.

So then I tried this way:<-function(startno,stopno){
# set up the vectors first
for (i in startno:stopno) {
thing<-getUser(data$user[i], cainfo="cacert.pem")[i]<-data$user[i]

return(data.frame(,created=USER.created, posts=USER.posts,followers=USER.foll, stringsAsFactors=FALSE))

So that was cool, until it wasn't. I mean, turns out that 2% of the users have deleted their accounts or block me or are private or something. So it didn't recover from that error and I tried to test for is.null() and is.NA() but it failed....
So then I went back to the mailing list and there was a suggestion to user try() but eek.
So then I noticed that if you have a pile to look through you're actually supposed to use
lookupUsers(users, includeNA=FALSE, ...)
And I did, and I wanted to keep the NA so that I could align with my other data later... but once again, no way to get the NAs out. And it's an object that's a pile of lists... which I was having trouble wrapping my little mind around (others have no issues).
So I went back and used that command again, and this time said to skip the NA (the not found users). Then I think from the mailing list or maybe from Stack Overflow? I had gotten the idea to use unlist. So here's what I did then:
easy.tweeters.noNA<-lookupUsers(data$user, cainfo="cacert.pem")
#check how many fewer this was
#1247 so there were 29 accounts missing hrm
for (i in 1:1247){holddf<-twListToDF(easy.tweeters.noNA[i])

And that created a lovely dataframe with all kinds of goodies for it. I guess I'll have to see what I want to do about the 29 accounts.

I really would have been happier if it was more graceful with users that weren't found.

Also, not for every single command you have to user the cainfo="cacert.pem" thingy... Every time, every command.

ALSO, I had figured out oauth, but the twitter address went from http:// to https:// and so that was broken, but I fixed it. I hope I don't have to reboot my computer soon! (Yeah, I saved my credentials to a file, but I don't know... )

Comments are off for this post

Older posts »