## Misunderstanding ArXiv

The physics, astro, CS, math and related fields e-print server, ArXiv, is often misunderstood and misrepresented. Specifically, it's often represented as a place anyone can send any article in any state.

Anyone: users must be endorsed by another user. Endorsers are active submitters in the same area. This could be a fairly low bar, but it is there.

Any article: articles can be rejected or reclassified. Articles are expected to be journal quality. There are moderators to make these calls.

Any state: articles are supposed to be done. Read this interesting discussion on AstroBetter. Even if the rules don't say it, the norms in one of the subject areas might.

## Another tilt at a holy grail: identifying emerging research areas by mapping the literature

Technology surprise, disruptive technologies, or being caught unaware when managing a research portfolio or research funding are some of the fears that keep research managers and research funders up at night. Individual scientists might see some interesting things at conferences and might keep a mental note, but unless they can see the connection to their own work, will likely not bring it back. Even if they do, they might not be able to get funding for their new idea if the folks with the bucks don’t see where it’s going. Consequently, there are lots of different ways to do technology forecasting and the like. One of the main ways has been to mine the literature. Of course, as with anything using the literature you’re looking at some time delay. After all, a journal article may appear three years or more after the research was started.

I’ve been party to a bunch of conversations about this and I’ve also dabbled in the topic so I was intrigued when I saw this article in my feed from the journal. Plus, it uses a tool I’ve had a lot of success with recently in my work, Sci2.

Citation: Guo, H., Weingart, S., & Börner, K. (2011). Mixed-indicators model for identifying emerging research areas Scientometrics DOI: 10.1007/s11192-011-0433-7

The data set they are using is all the articles from PNAS and Scientometrics over a period of thirty years from 1980 to 2010. They’re using the information from Thomson Reuters Web of Science, not the full text of the articles.

Their indicators of emerging areas are:

• Addition of lots of new authors
• Increased interdisciplinarity of citations
• Bursts of new phrases/words

This differs from other work from like Chen and Zitt and others that cluster on citation or co-citation networks and also look at nodes with high betweenness centrality for turning points.

The addition of new authors per year is pretty straight forward, but the other two methods deserve some description. For disciplinarity, each cited article is given a score based on its journal’s location on the UCSD map of science. Then a Rao-Stirling diversity score is calculated for each article to provide a interdisciplinarity of citations. For each pair of citations in the reference list, the probability of the first being in a discipline, the probability of the second in its discipline and then the great circle distance between the two disciplines are used for the score (the map is on a sphere hence not Euclidean distance). The limitations are pretty numerous. First the map is only journals, only 16k journals, and only journals that were around 2001-2005 (think of how many journals have come out in the last few years).  Articles with more than 50% of the citations not going to things on the map were dropped. They mentioned areas with a lot of citations to monographs, but I would think the bigger problem would be conferences. Newer research areas might have problems finding a home in established journals or might be too new for journals and might only be appearing at conferences.

For word bursts, I know from using Sci2 that they’re using Kleinburg’s (2002, \$,free pdf on citeseer) algorithm, but I don’t believe they state that in the article. Their description of their implementation of the algorithm is here. I’ve been curious about it but haven’t had the time to read the original article.

In general, this article is pretty cool because it’s completely open science. You can go use their tool, use their dataset, and recreate their entire analysis – they encourage you to do so. However, I’m not completely convinced that their areas work for detecting emerging research areas given the plots of one of their methods against another. Their datasets might be to blame. They look at two individual journals so any given new research area might not catch on in a particular journal if there’s a conservative editor or something. PNAS is a general journal so the articles probably first appeared in more specialized journals (they would have to be big time before hitting Science, Nature, or PNAS). Also, the interesting thing with h-index and impact factor (the two emerging areas looked at for Scientometrics) is not their coverage in the information science disciplines, but h-index’s emergence from the physics literature and the coverage of both in biology, physics, and other areas. If you look across journals, your dataset quickly becomes huge, but you might get a better picture. Impact factor was first introduced decade(s) before the start of their window, but socially it’s become so controversial because of the funding tied to it and promotion and tenure tied to it – I believe a more recent phenomenon. Coding journals by discipline has often been done using JCR’s subject categories (lots of disagreement there) but others have mapped science journals by looking at citation networks. It would be computationally extremely expensive, but very cool to actually have a subdisciplinary category applied to each article – not taken from the journal in which it was published. Also, someone really needs to do more of this for areas with important conference papers like engineering and cs.

## scio11: blogs, bloggers, and boundaries

Blogs, Bloggers and Boundaries? - Marie-Claire Shanahan, Alice Bell, Ed Yong, Viv Raper

AB: arsenic life – nasa spokesperson – not engage without peer review. But this is really a spam filter issue – too tight you miss things you want to hear, too loose and you’re overwhelmed by the junk. The scientists need some sort of filter.

Or think of science as a map (a la Gieryn). The world benefits from specialization – we can’t all be specialists in everything so we rely on trust and on shortcuts to navigate different spaces.

Boundaries keep people out but also create shared spaces inside. Jargon and in jokes. Jargon makes in groups, in jokes are an expression of friendship but also make people feel left out.

How do we calibrate our spam filter. Be clever about boundaries – who’s on the other side, do we want to speak to them? are the boundaries intentional

EY: strength of the authors conclusions are dependent on the community’s ability to explain differently. example foxes and magnetic north – ways to test and ways to interpret data.

Blogged about half male/female chickens. Got contacted by a farmer who has such an animal – connected him to the scientist. They formed a research collaboration across two continents.

The farmer said contacted EY bcs detailed but understandable information found through web search.

He does a who are you thread on his blog every year- and this helps him determine what to write and how to write it.

M-CS: boundaries > boundary layers in fluid and what degree of mixing and places where different fluids come together. blogs give us many different kinds of boundaries: information, people,

people: using blogs as an information source in a one way communication. her elementary education students didn’t think of blogs as conversational. newspaper comments – speaking to each other, no mixing,

VR: some more practical tips – instead of defining terms repeatedly in text, use a mouseover plain language glossary. picking terms that are sensitive to the audience you’re looking for. Had her mother read her blog – learned a lot.

EY: – you can alienate someone because you’re not in their very specific area of science – your language will be different from theirs (gave an example from the oil spill with a geophysicist and engineer – maybe on an agu blog?). you may be reaching a much smaller audience than you think

from the audience – blogs are intimidating to some people. blogging is a subculture in and of itself

even if you write for a general audience, commenters might be very sophisticated and use jargon and poke holes in things and intimidate other commenters.

how to keep and engage people who stumble across your blog when searching for answers to a specific question?

different crowd following DSN on the facebook page than at the blog itself. younger.

ey had the same experience – comments on facebook when none on blogs, but they’re often reacting to the title or at most first paragraph so you have to be more careful with titles and ledes.

and ran out of battery

## More on the impact of old folks in academe

My last post (I am so far from productive it's not funny... maybe I should be put out to pasture, lol) was a research blogging review of an article evaluating the claims that the graying of academia will impact productivity. DrugMonkey just brought an NSF report on a related subject to my attention.

Hoffer, T.B., Sederstrom, S., & Harper, D. (2010) The End of Mandatory Retirement for Doctoral Scientists and Engineers in Post Secondary Institutions:  Retirement Patterns 10 Years Later. InfoBrief (NSF11-302). http://www.nsf.gov/statistics/infbrief/nsf11302/nsf11302.pdf

• The age of retirement is edging up, but hasn't jumped dramatically as maybe some would have expected.
• There's also interest in the interaction with type of institution (by Carnegie classification). Research universities had lower retirement rates for each age group, but the difference wasn't statistically significant.
• Disciplinary area also didn't seem to make a difference with the exception of bio, ag, health sciences which had a drop in retirement rates from 39.8% to 33.7% from 1993 to 2003.
• Men do retire earlier than women and the spread is widening.

## Are the old folks holding us back?

We've been hearing a lot about how hard it is to get a tenure track job - arguably harder even than it was during other economic recessions. We've also been hearing about how the age of NIH PIs is going up. I guess the age at first award is going up as well as the average. At the same time, there are a lot of stories about how the most disruptive ideas have come from young scientists. Also that young scientists are more productive. So if academia is aging and older scientists are not leaving, does this mean there will be an accompanying loss in productivity? That’s what this article looked into. It’s a review of the research on many different aspects of this question: “whether there is empirical support for the belief that age is negatively related to scientific achievement”

I never realized that there had been mandatory retirement at universities in the US. This of course is discrimination (violated the amended Age Discrimination in Employment Act), so was abolished, but only in 1994. What with baby boomers and all not retiring, you can see how the average age would be going up and how there would be less room to hire young faculty. Many colleges apparently try (tried?) to incentivize early retirement but maybe only with limited success.

The first thing the author looked at is if there is an association between age and scientific achievement. He proposes four factors:

• changes in the cognitive abilities due to age
• changes in motivation
• availability of resources
• legal curtailing (compulsory retirement where and when exists/existed)

As for cognitive ability, it’s interesting the author goes back to research published in the 1950s which seems to set the stage. There were cohort issues and a lot depends on the way these abilities are tested. Large longitudinal studies seem to find that abilities do decline but fairly slowly until about age 80. There are also suggestions that, after being in a specific line of work for a while, you get in a rut and you don’t think of new or creative ways to solve problems. But this is something you could fix by changing approaches or adopting a new research area.

As for motivation, there are some economic models based on anticipated life income that would indicate a drop in motivation. On the other hand, if scientists are more motivated by prestige and the rewards and recognition that come from scientific achievement, they should be more motivated to continue publishing as they go along. Likewise, if they have intrinsic motivation and enjoy their work, there’s no reason that should drop off.

As for availability of resources, the work of the Coles and of Price found that the top researchers account for the majority of the publications – it’s a standard long tailed distribution. Price’s law is: if there are k researchers in a field, $sqrt{k}$ will be responsible for 50% of the contributions. Merton’s Matthew Effect is that the rich get richer. So if you’re successful, then you’ll get more rewards and resources which facilitates more and better science. Success breeds success, but failure leads to more failure. Aging scientists who have not been successful will most likely drop in productivity.

As for compulsory retirement – most of these studies date back to when that was the case. Also, in some countries, older scientists can’t start big new projects a few years before retirement so they actually start winding down early.

The author then goes on to review the evidence. Indicators of achievement include prizes and awards, citation count (no mention of the h-index, but that would make sense), number of publications, number of grants, etc. There are always problems with the research designs in these studies. They don’t account for the age distribution of the population of scientists when looking at the age of award winners. There are cohort effects – different groups of scientists use different methods and have different publication patterns. There are also period effects – there might be some historical situation that increased or decreased publication during a period in the author’s career.

A bunch of the older studies find a sort of downward curve effect. The awards, publications, citations all seem to peak around the 40s and then drop off.   These studies do show a decline in productivity with age, but age seems to account for only a small part of the variation. Past performance is the best predictor and the quality of the work doesn’t seem to decrease. For math, though, the story is quite different – it’s a flat line for productivity and age.

In newer studies (there are only four here), the line is basically flat with a slight rise at 11-15 years into the career and another at 26-30 years into the career.

The author didn’t find evidence to support the idea that the graying of academia will lead to a loss in productivity. Additional longitudinal studies are needed, but they will be difficult to perform for the reasons already listed as well as the changes in publication norms. Most universities are pushing very hard for their faculty to publish so the number of publications will continue to rise.

So it seems crazy to force productive scientists to retire, but it’s complicated to get unproductive but highly paid scientists to retire while keeping the productive scientists. If no one retires, the older scientists have pretty high pay and there will be little room for the pipeline.

It’s an interesting article and very readable. Recommended.

Reference: Stroebe, W. (2010). The graying of academia: Will it reduce scientific productivity? American Psychologist, 65 (7), 660-673 DOI: 10.1037/a0021086

## NASA can’t have it both ways

Not to anthropomorphize a government agency or anything, but NASA is really confused in their social media actions.

I’m the millionth person to point this out but it seems worthwhile for me to do so if for no other reason than to be able to find the information later by searching my blog.

NASA has had the policy and practice (and mandate?) to share their science with “the public.”  The public being US taxpayers, but also related scientists worldwide, children, and lots of other groups. They do this through websites and tv shows and more recently podcasts, blogs, and twitter. They publish scientific findings in scholarly journals, present them at meetings, and share scientific data freely through many different archives.  Organizations that receive funding from NASA are required to do the same.* NASA typically does a pretty good job of this – partly because their stuff is so very fascinating that it would be hard not to have a cool and interesting message about it but mostly because they have lots of professional communicators, outreach professionals, and experienced scientists who work hard at it.

With that said, what on earth (or in space, ha!) are they thinking in this reaction to Dr. Redfield’s evaluation of their recent microbiology/arsenic research? David Dobbs has a good blog post describing this. Dobbs quotes

From “NASA’s arsenic microbe science slammed,” at CBC News:

When NASA spokesman Dwayne Brown was asked about public criticisms of the paper in the blogosphere, he noted that the article was peer-reviewed and published in one of the most prestigious scientific journals. He added that Wolfe-Simon will not be responding to individual criticisms, as the agency doesn’t feel it is appropriate to debate the science using the media and bloggers. Instead, it believes that should be done in scientific publications.

My immediate concern isn’t whether the science is good or that the criticisms are valid, but certainly if NASA intends to engage with the “public” and not just broadcast to us, they need to respond to these criticisms.  Further, these responses should be in an appropriate manner – a blog post or a comment on Dr. Redfield’s blog. Dr Redfield’s blog is well known and well-respected and she registered the post with ResearchBlogging. Her comments section is also very informative. I agree that NASA shouldn’t necessarily be expected to engage on all fronts with people linking to their work, but as Dobbs says, this blog is different.

Moreover, this paper is being reviewed on many blogs by scientists who are expert in this field and adjacent fields, and has been reviewed on F1000 (some links from Code for Life blog).  If you have a press release on a paper, then you should be prepared to continue the engagement after you have broadcast your message.

The paper’s author has also stated that replies should be in a “scientific venue.”**  My dear scientist, the web is a scientific venue! Haven’t you heard? This is the #altmetrics or post-publication peer review we’ve been talking about for quite a while.

Interestingly, some of the comments on the original post by Redfield basically indicate that responding on blogs is only for those who don’t have standing or who are not qualified. Grrr. That person needs to be educated! (I do hope that the technical comment or whatever that is eventually sent to Science attributes some credit to the commenters of that thread – a lot of good stuff there).

Update: Randy left a nice comment (thank you) which caused me to go and look at updates on the Guardian site. This caught my eye:

"'Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,' wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. "The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner."

So the proper way to engage in scientific discourse is to hold press conferences (2 now)? Gosh, maybe I should toss my entire dissertation because I've been witnessing scientific discourse at conferences, in conference hallways, on twitter, in blogs, on wikis, on post-publication peer review sites... Hrumph.

* One more time for the record. My place of work of course (google me) gets money from NASA. This post is my opinion only and does not reflect that of my place of work or any of the employees there. This post is purely from the point of view of an observer of scholarly communication.

** Do note that I am American. I put my . in my “” Canadians, Australians, and Brits for some crazy reason put it outside.

## The role of trust in science

Egon Willighagen just posted that Trust Has No Place in Science.  His point is that Antony Williams asked if/how much people trust various chemical databases and Egon answered that he doesn’t trust any of them, he verifies. Ok.

So back to the old standard, Mertonian norms. It is a norm of science to practice organized skepticism. (Merton argued that this wasn’t skepticism about everything, but specific to scientific ideas and statements – this specifically isn’t about religion or patriotism). Scientists don’t believe things just on someone’s word, they need evidence.

Right. But what form does that evidence take? Egon says he verifies everything. So I guess there’s no need for the database then? I mean, if you’re going to re-run all of the experiments that provide the data. Or even read through the methods sections of the journal articles carefully. But wait. How can you believe the methods section of journal articles. Articles have been retracted for things wrong in the methods section. Even if you read through the methods section, how do you interpret the results? How do you know it was the right method to use? Well, then you must do all of the experiments that lead up to developing that method. And then you better redesign the instruments. Oh, and make your own reagents or whatever.

At some point you have to evaluate things, but then move on. Philosophers will give you careful descriptions of it, but there’s the idea that you need some assumptions for every empirical test (Duhem-Quine).

Then there’s whole areas that you’re not expert in that you need co-authors to support you in. Are you going to check all of their work and second guess them?

There is some trust – skepticism – but trust.

(nah, I’m not out of my blogging funk and this isn’t up to my standards, but I need to at least try to blog to get back on the horse   )

update: spelled Antony's name wrong! sorry.

## ASIST2010: Structure and evolution of scientific collaboration networks in a modern research collaboratory

Alberto Pepe, UCLA (dissertation award winner)

hasn’t looked at his dissertation for 4 months so he might be rusty, ask his advisor

look at collaboration in a collaboratory where work is interdiscplinary and distributed. Physical and virtual spaces… At CENS, sensor network research, about 300 researchers, multidisciplinary (ee, cs, stats, bio, environ sci, urban planners, sociology, media). Multi-sited within southern California. He was a participant observer.

Important to study relations instead of things bcs of connectedness of sci collabs. Looked at co-authorship, communication (mailing lists), acquaintanceship

What is the topology of the network, what is the structure of CENS and how has it evolved, how are the networks related to each other – what is the role of communication and acquaintanceship on co-authorship?

Authorship data – CENS annual reports are the official listing of all of the publications from the collaboration (lucky this exists – doesn’t for many orgs). 600 publications over 7 annual reports. 400 conf papers. Yr 2 was most productive, dip yr 5 then rise. Most papers have 2-3 authors. Hard to define the boundaries of the system are since there is co-authorship across institutions, countries – used co-authorship to draw boundaries.

87 mailing lists, 30k e-mails, 1500 threads.

Social survey: who do you know?

(opt out – each has a name and picture from a public database…. 300 people from co-authors).

How do you know them? When did you meet, how often do you communicate?

10-30 acquaintances – 191 of 373 responded to the survey.

he didn’t look at betweenness (or apparently like Bonaicich or eigenfactor)

Community structure:

Used Newman-Girvan method. (would have been nice if he had drawn a line around them or something – hard to see – as he did on a later slide)

some communities around country of origin, academic affiliation, position (staff,faculty,phd).

co-auth and acq overlap and are one institution and one discipline (i think he said) but are becoming more interdisciplinary and less inter-institutional

cens collab  communities are open fluid, inclusive, and small-worldish (but not based on prestige).

hubs bring different communities together not as interdisciplinary as we have heard

these q occurred to me: the survey – ethical issues with opt out? ethical issues with pictures? did they shift the order? order alphabetical? fatigue? sort in groups

a: they were ordered by institution and discipline, tried other things but this was what worked

In CS they co-author without knowing each other !?!?! that’s cool. In bio they know each other if they co-auth

wrt ethical questions – director of CENS was on the dissertation committee so was all approved.

## ASIST2010: Virtual Organizations as Sociotechnical Systems (VOSS)

This is an NSF Program. These are people who are funded by this program but have very different projects and approaches.

Florida State – Gary Burnett – spoke to how happy they were to see this call, because it requires a multidisciplinary approach.

The program is not for infrastructure development but social sciences on how science works as a distributed sociotechnical system. Virtual organizations may be distributed geographically but are (connected?) by cyberinfrastructure…

Kevin Crowston – Syracuse

(with Andrea Wiggins) They’re looking at citizen science. Showing science cheerleader’s scienceforcitizens.net site. Examples of these are ebird, galaxy zoo, etc. Two year project with the first year concentrated on classification. What types of citizen science are there? What are the work processes like, how is the work divided… main theoretical is the model from open source software which draws on literature of small group interactions – mostly online though so some modifications. Also looking at how computers support the work. They have a workshop coming up with organizers to talk about what’s needed and what works. For doing an ethnography, there’s no there, particularly in the case with the participants. They are doing interviews with organizers and will do some with participants, some participant observation (team member is a member of these citizen science projects), will do a survey later.

Besiki Stvilia – Florida State Life-cycle formation and long-term scientific collaboration

teams transition to discrete experiment-based to long term collaborations. [they had a poster last night that looked at if diversity on teams make them more productive – found some support for disciplinary diversity increasing productivity]. They have analyzed documents and did some bibliometrics on co-authorship as well as team membership.

Book of Burnett and Maryland’s Paul Jaeger on information worlds – used that as a theoretical lens.

MaryBeth …. (not the person on the program)

Around a high resolution CT machine that’s in the physical anthropology dept. People from there, a cog sci, a sociologist, a MIS (small groups, innovation)… Has been some ethnography, some semi-structured interviews, some video tapes. Looking at center for quantitative imaging – scanning, visualization, other services. Tightly-coupled research projects that come and go – these are also distributed. Using analytic induction – taking high level codes from the collaboratory literature and then emergent codes. Not just cmc type info, but who owns the data from skulls housed at the museum. What is the lifetime of these rights. They are also prototyping systems that will help management and coordination.

Rolf Wigang (not present)

using multiplayer games as a platform, looking at how trust and leadership develop

GB – information worlds – information values, might come from different information worlds but dealing with the same objects may value differently.

KF – power totem – ethnographic lab studies.. difference in studying up… gate keepers are the informants – might require review…

they really aren’t having these issues bcs in at least two of the projects head of the science lab is involved and supportive. For citizen science – not really. Public collecting data – who gets to see it. How can you get the public to interact with the data. If the goal of the project is to make a change – its science based but can be community driven.

## Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific articles that ever make it into the press, particularly in areas outside of health and medicine, and also given the fact that for everyone out of formal education, the media is their primary source of science education.

Recent studies do show that scientists often don't mind talking to reporters and do so more frequently that one might think [1-2]. They do get kind of frustrated that their work is misrepresented - even if that misrepresentation is failing to include qualifying statements.  Newspapers in general covered a lot more science over time (as studied in the time period 1951-1971, I know). Fancy journals that issue press releases for papers find that those papers are more likely to be reported in the news media. The authors cite another study that some 84% of the newspaper stories originated from press releases.

This study was just about how much makes it to the media and is that percentage staying steady as the number of papers increases. When they actually did the work, they only looked at parts of 2 years, 1990 and 2001, and two media outlets, Time and NBC News. They didn't use the WaPo or NYT because better educated people read them (???). Plus, they found that only 25-50% of news pieces actually mention the article's author and venue, so they probably missed a ton.

So this is quite disappointing, really. The study narrowed the coverage of the search so much, that I don't think it's really representative of anything. Of course only a few articles get discussed in the media, but if you want numbers, this paper won't help. These articles also need to start discussing things like Nova and National Geographic and Discovery Channel. We watch that stuff all the time and so do a lot of people we know (of course I'm pretty well educated, I guess).

They mention journal press releases, but for big science there are also lab press releases and media officers. There are also scientists talking directly to the public on blogs.

One thing you can probably take away, if you work outside of biomed and/or are not publishing in Science or Nature and have a really cool result, don't wait for the press to come a knockin' - get it out there another way.

Here's the citation:

Suleski, J., & Ibaraki, M. (2010). Scientists are talking, but mostly to each other: a quantitative analysis of research represented in mass media Public Understanding of Science, 19 (1), 115-125 DOI: 10.1177/0963662508096776

[1] Peters, H. P., Brossard, D., de Cheveigne, S., Dunwoody, S., Kallfass, M., Miller, S., & Tsuchida, S. (2008). Science-Media Interface: It's Time to Reconsider. Science Communication, 30(2), 266-276. doi:10.1177/1075547008324809

[2] Dunwoody, S., Brossard, D., & Dudo, A. (2009). Socialization or rewards? Predicting U.S. scientist-media interactions. Journalism and Mass Communication Quarterly, 86(2), 299-314. Retrieved from http://aejmc.org/topics/wp-content/uploads/2009/09/3-Dunwoody-et-al.pdf

Older posts »

• Scientopia Blogs

• I'm a science and engineering librarian in a university-affiliated research center and a doctoral candidate at University of Maryland. Nothing here represents either place.

• ## Contribute to Scientopia

Click here to toss some change in our contribution jar. All funds donated will be used solely to support Scientopia's operating costs. See our Code for more information on our funding policies.
• Where am I?

N 39 W 76

Research Blogging Awards 2010 Finalist