Archive for the 'Conferences' category

Notes from 1.5 days of: Collections as Data: Hack-to-Learn

Aug 10 2017 Published by under Conferences, information analysis

You guys - this post has been in draft since May 22, 2017! I'm just posting it...

Collections as Data: Hack-to-Learn was a fabulous workshop put on by the Library of Congress, George Washington University Libraries, and George Mason University Libraries. It was a neat gathering of interesting and talented people, nifty data, and very cool tools.  It didn't hurt either that it was in a beautiful conference room with a view of the Capitol the first day and at the renovated Winston Churchill Center at GWU the second. A lot of it was geared toward metadata librarians and digital humanities librarians, but I felt welcomed. Readers of this blog will know that I really want to bring these tools to more public services/liaison/etc., librarians so it was good.

Unfortunately, I had to leave mid-day on day 2 because of a family emergency 🙁 (everybody is ok) but here are some interesting tidbits to save and share.

Data Sets:

LoC MARC Records

Have you heard that LoC freed a ton of their cataloging data? FREE. Should have always been freely available. Actually this is only up to December 2013 and the remainder are still under paid subscription ... but ... still! People are already doing cool things with it (neat example). We had a part of this that the organizers had kindly already done some editing on.

Phyllis Diller Gag File

This was a sort of poorly formatted csv of several drawers of the file. Hard not to just sit and chuckle instead of analyzing

Eleanor Roosevelt's My Day Columns

Apparently Roosevelt wrote these from the 1930s to her death in 1962. Originally she wrote them 5 days a week but tapered to 3 when her health failed. They are a few paragraphs and more or less dryly list her activities.

End of Term Tumblr Archive (no link)

This was archived as part of the efforts to capture the outgoing administration's stuff before it disappeared. It was a very interesting collection of things from museums to astronauts.


Somewhere in here we covered TEI - I had no idea this existed. How cool. So like when you're doing transcripts of interviews you can, for example, keep the erm, uh, coughs... or ignore depending on the level of analysis?  TEI lets you annotate texts with all sorts of detail and make it linked data for entities, etc.


  • OpenRefine - more detailed use and examples of reconciliation
  • Voyant - very, very cool tool to at least do preliminary analysis of text. NB: installing on my work windows machine was a bit rough. I ended up getting a Linux VM and it works well/easily. The visualizations are great. Limitation in number of texts you can import at a time.
  • MALLET - did you think this one was too hard and required java or some such? Turns out there's a command line one anyone can use. We did topic models for some of the sets. I think I will probably stay with the way I've been doing them in R because seems like they're easier to understand.
  • Gephi - yeah, again, and i still can't get along with it. I have to face that it's just me.
  • Carto - a cool mapping tool

Also, day 2 someone suggested spaCy instead of NLTK for natural language processing in Python. This is another thing I couldn't get working for anything on my windows box from work. I don't know if there is something being blocked or what. Installs and works beautifully on the Linux machine, though.



No responses yet

ACRL2017: When Tradition and Reality Collide: Metrics, Impact and Beyond

Mar 25 2017 Published by under bibliometrics, Conferences

Friday morning Abigail Goben, Meg Smith, and I presented at the Association of College and Research Libraries conference in Baltimore. I am not an academic librarian but I do serve researchers. I would say that SLA is probably more appropriate for librarians serving researchers in government, industry, and other settings. This was local, though!

The polls yielded some interesting feedback.

  • Our audience members were overall fairly experienced in metrics, with some experts. They knew most of the terms we threw out
  • Many of their libraries have informal support for metrics with a few libraries having formal support
  • Librarians sometimes have an uneasy role with metrics:
    • Frustrated with inappropriate use or use by uninformed people
    • Difficulty working with researchers and with administration: who should pay for the tools? who should do the work?
    • Librarian as neutral vs. metrics for competition
  • Many organizations do have RIM thingies, but they are mostly at the office of research or provost's office. There is a need for more help in how librarians can work with these offices.

No responses yet

Using bibliometrics to make sense of research proposals

Nov 01 2016 Published by under bibliometrics, Conferences

This was presented at the Bibliometrics & Research Assessment Symposium held at NIH on October 31, 2016.

No responses yet

Notes from International Symposium on Science of Science 2016 (#ICSS2016) - Day 2

This day's notes were taken on my laptop - I remembered to bring a power strip! But, I was also pretty tired, so it's a toss up.


Luis Amaral, Northwestern

What do we know now?

Stringer et al JASIST 2010 distribution of number of citations

25% of papers overall in WoS (1955-2006) haven’t been cited at all yet for particular journals (ex. Circulation) there may be no papers that haven’t been cited.

Stringer et all PLoS ONE 2008 – set of papers from a single journal

Discrete log normal distribution – articles published in a journal in a year

Works well for all but large, multidisciplinary journals – Science, Nature, PNAS, but also PRL and JACS

For most journals takes 5-15 years to reach asymptotic state

Moreira et al PLOS ONE 2015 – set of papers from a department. Also discrete log normal.

Also did work on significant movies - citations using IMDB connections section (crowd sourced annotation of remakes, reuse of techniques like framing, references/tributes, etc.)

Brian Uzzi, Northwestern

Age of Information and the Fitness of Scientific Ideas and Inventions

How do we forage for information – given a paper is published every 20 minutes – such that we find information related to tomorrow’s discoveries?

He’s going to show WoS, patents, law and how pattern works.

Foraging with respect to time (Evans 2008, Jones & Weinberg 201?)

Empirical strategies of information foraging some papers reference a tightly packed by year, some high mean age, high age variance…

Average age of information (mean of PY - PY of cited articles)

Low mean age, high age variance is most likely to be tomorrow’s hits (top 5% cited in a field)

Tried this same method in patent office- inventors don’t pick all the citations. Examiner assigns citations. Patents have the same hotspot.


Audience q: immediacy index, other previous work similar…

A: they mostly indicate you want the bleeding edge. Turns out not really you need to tie it to the past.

Cesar Hidalgo, MIT

Science in its Social Context

Randal Collins “the production of socially decontextualized knowledge” “knowledge whose veracity doesn’t depend on who produced it”

But science is produced in a social context

He is not necessarily interested in science for science's sake but rather, how people can do things together better than they can do individually.

What teams make work that is more cited

Several articles show that larger teams produce work that is more cited, but these papers were disputed. Primary criticism: other explanatory factors like larger things are more cited, more connected teams, self-promotion/self-citation with more authors, also cumulative advantage – after get one paper in high impact journal easier to get more in there

Various characteristics – number authors, field, JIF, diversity (fields, institution, geographic, age),

Author disambiguation (used Google Scholar – via scraping)

Connectivity – number of previous co-authorship relationships

Collaboration negative fields vs. collaboration positive fields

On average more connected the team more cited the paper on average. Interaction between JIF and connectivity. Weak but consistent evidence that larger and more connected teams get cited more. Effects of team composition negligible compared to area of publication and JIF.


How do people change the work they do?

Using Scholar 99.8%, 97.6% authors publish in four or more fields… typically closely related fields

Policy makers need to assign money to research fields – what fields are you likely to succeed in?

Typically use citations but can’t always author in fields you can cite (think statistics)

Use career path? Fields that cite each other are not fields authors traverse in their career path.

Q: is data set from Google Scholar sharable?

A: He’s going to ask them and when his paper is out and then will

Guevara et al (under review )

Data panel

Alex Wade, Microsoft Research – motivation: knowledge graph of scholarly content. Knowledge neighborhood within larger knowledge graph usable for Bing (context, and conversations, scaling up the knowledge acquisition process), Cortana, etc. Can we use approaches from this field (in the tail) for the web scale? Microsoft Academic Graph (MAG). MS academic search is mothballed. Now on Bing platform building this graph – institutions, publications, citations, events, venues, fields of study. >100M publications. Now at  - can see graph, institution box. Pushed back into Bing – link to knowledge box, links to venues, MOOCs, etc. Conversational search… Cortana will suggest papers for you, suggest events.

[aside: has always done better at computer science than any other subject. Remains to be seen if they can really extend it to other fields. Tried a couple of geoscientists with ok results.]

James Pringle, Thomson Reuters – more recent work using the entire corpus. Is the Web of Science up to it? 60 M records core collection. Partnered with regional citation databases (Chinese, SciELO, etc). "One person’s data is another person’s metadata." Article metadata for its own use. Also working with figshare and others. Building massive knowledge graph. As a company interested in mesolevel. Cycle of innovation. Datamining, tagging, visualization… drug discovery…connection to altmetrics… How do we put data in the hands of who needs it. What model to use? Which business model?

Mark Hahnel, Figshare

Figshare for institutions – non-traditional research outputs, data, video … How can we *not* mess this up? Everything you upload can be tracked with a DOI. Linked to GitHub. Tracked by Thomson Reuters data cite database. Work with institutions to help them hold data. Funder mandates for keeping data but where’s the best place?

Funders require data sharing but don’t provide infrastructure.

Findable, interoperable, usable, need an api … want to be able to ask on the web: give me all the information on x in csv and get it. Can’t ask the question if data aren’t available.

Need persistent identifiers. Share beta search.

Daniel Calto, Research Intelligence, Elsevier

Data to share – big publisher, Scopus, also Patent data and patent history,

Sample work: comparing cities, looking at brain circulation (vs. brain drain) – Britain has a higher proportion of publications by researchers only there for 2 years  - much higher than Japan, for example

Mash their data with open public information.

Example: mapping gender in Germany. Women were more productive in physics and astronomy than men. Elsevier Research Intelligence web page full global report coming

Panel question: about other data besides journal citations

Hahnel: all sorts of things including altmetrics

Pringle: usage data  - human interactions, click stream data, to see what’s going on in an anonymous way. What’s being downloaded to a reference manager; also acknowledgements

Calto: usage data also important. Downloading an abstract vs. downloading a full text – interpreting still difficult. How are academic papers cited in patents.


Reza Ghanadan, DARPA

Simplifying Complexity in Scientific Discovery (aka Simplex)


Datafication > knowledge representation > discovery tools

Examples: neuroscience, novel materials, anthropology, precision genomics, autonomy

Knowledge representation

Riq Parra – Air Force Office of Science Research

(like Army RO and ONR) their budget is ~60M all basic research (6-1)

All Air Force 6-1 money goes to AFOSR

40 portfolios – 40 program officers (he’s 1 of 40). They don't rotate like NSF. They are career.

Air Space, Outer Space, Cyber Space.

Some autonomy within agency. Not panel based. Can set direction, get two external reviews (they pick reviewers), talk a lot with the community

Telecons > white papers > submissions > review > funding

How to talk about impact of funding? Mostly anecdotal – narratives like transitions. Over their 65 years they’ve funded 78 Nobel Prize Winners on average 17 years prior to selection

Why he’s here – they do not use these methods to show their impact.  He would like to in spirit of transparency show why they fund what they fund, what impact it has, how does it help the Air Force and its missions.

Ryan Zelnio, ONR

horizon scan to see where onr global should look, where spend attention and money, assess portfolio

global technology awareness quarterly meetings

20-30 years out forecasting

Bibliometrics is one of a number of things they look at. Have qualitative aspects, too.

Need more in detecting emerging technologies

Dewey Murdick, DHS S&T

All the R&D (or most) for the former 22 agencies. More nearer term than an ARPA. Ready within months to a couple years. R&D budget 450M … but divide it over all the mission areas and buy everyone a Snickers.

Decision Support Analytics Mission – for big/important/impactful decisions. Analytics of R&D portfolio.

Establishing robust technical horizon scanning capability. Prototype anticipatory analytics capability.

Brian Pate, DTRA

Awareness and forecasting for C-WMD Technologies

Combat support agency – 24x7 reachback capability. Liaison offices at all US Commands.

6.1-6.3 R&D investments.

Examples: ebola response, destruction of chem weps in Syria, response to Fukushima.

Low probability event with high consequences. No human studies. Work with DoD agencies, DHS, NIH, others.

Move from sensing happening with state actors to anticipatory, predicting, non-state actors.

Deterrence/treaty verification, force protection, global situational awareness, counter wmd

BSVE – biosurveillance architecture, cloud based social self-sustaining, pre-loaded apps

Transitioned to JPEO-CWD – wearable CB exposure monitor

FY17 starting DTRA tech forecasting

Recent DTRA RFI – on identifying emerging technologies.

Audience q: Do you have any money for me?

Panel a: we will use your stuff once someone else pays for it

Ignite talks - random notes


Ethnea - instance based ethnicity, Genni (JCDL 2013), Author-ity (names disambiguated)

Predict ethnicity gender age

MapAffil - affiliation geocoder

Ethnicity specific gender over time using 10M+ pubmed papers


Larramore: Modeling faculty hiring networks


Bruce Weinberg, Ohio State

Toward a Valuation of Research

IRIS (Michigan) – people based approach to valuing research. People are the vectors by which ideas are transmitted, not disembodied publications

- CIC/AAU/Census

Innovation in an aging society – aging biomedical research workforce

Data architecture

  • bibliometric
  • dissertations
  • web searches
  • patents
  • funding
  • star metrics (other people in labs), equipment, vendors
  • tax records
  • business census

Metrics for transformative work

  •  text analytics
  • citation patterns from WoS

Impact distinct from transformative. Mid-career researchers moving more into transformative work.

Some findings not captured in my notes: how women PhD graduates are doing (same positions, paid slightly more, held back by family otherwise). PhD graduates in industry staying in the same state, making decent money (some non-negligible proportion in companies with median salaries >200k ... median.)

John Ioannidis, Stanford

Defining Meta-research: an evolving discipline

- how to perform communicate verify evaluate reward science

- paper in PLOS Biology, JAMA



Comments are off for this post

Notes from International Symposium on Science of Science 2016 (#ICSS2016) - Day 1

This conference was held at the Library of Congress March 22 and 23, 2016. The conference program is at:

I had the hardest time remembering the hashtag so you may want to search for ones with more C or fewer or more S.

This conference was only one track but it was jam-packed and the days were pretty long. On the first day, my notes were by hand and my tweets were by phone (which was having issues). The second day I brought a power strip along and then took notes and tweeted by laptop.

One thing I want to do here is to gather the links to the demo tools and data sets that were mentioned with some short commentary where appropriate. I do wish I could have gotten myself together enough to submit something, but what with the dissertation and all. (and then I'm only a year late on a draft of a paper and then I need to write up a few articles from the dissertation and and and and...)
Maryann Feldman SciSIP Program Director

As you would expect, she talked about funding in general and the program. There are dear colleague letters. She really wants to hear from researchers in writing - send her a one-pager to start a conversation. She funded the meeting.

Katy Börner Indiana University

She talked about her Mapping Exhibit - they're working on the next iteration and are also looking for venues for the current. She is interested in information analysis/visualization literacy (hence her MOOC and all her efforts with SCI2 and all). One thing she's trying now is a weather report format. She showed an example.

She did something with the descriptive models of the global scientific food web. Where are sources and where are sinks of citations?

Something more controversial was her idea of collective allocation of funding. Give each qualified PI a pot of money that they *must* allocate to other projects. So instead of a small body of reviewers, everyone in the field would be a reviewer. If the top PI got more than a certain amount. They would have to re-allocate to other projects.

I'm not sure I got this quote exactly but it was something like:

Upcoming conference at National Academy of Science on Modeling Sci Tech Innovations May 16-18.

They have a data enclave at Indiana with research data they and their affiliates can use. (I guess LaRiviere also has and has inherited a big pile o'data? This has been a thought of mine... getting data in format so I could have it lying around if I wanted to play with it).
Filippo Radicchi Indiana University

He spoke about sleeping beauties in science. These are the articles that receive few citations for many years and then are re-discovered and start anew. This is based on this article. Turns out the phenomenon occurs fairly regularly and across disciplines. In some cases it's a model that then is more useful when computing catches up. In other cases it's when something gets picked up by a different discipline. One case is something used to make graphene. He's skeptical one of the top articles in this category is actually being read by people who cite it because it's only available in print in German from just a few libraries! (However, a librarian in the session *had* gotten a copy for a staff member who could read German).

I would love to take his 22M article data set and try the k-means longitudinal. If sleeping beauty is found often, what are the other typical shapes beyond the standard one?

He also touched on his work with movies - apparently using an oft-overlooked section of IMDB that provides information on references (uses same framing as x, adopt cinematography style of y, remakes z... I don't know, but relationships).

Carl Bergstrom University of Washington

The first part of his talk reviewed Eigenfactor work which should be very familiar to this audience (well except a speaker on the second day had no idea it was a new-ish measure that had since been adopted by JCR - he should update his screenshot - anyhoo)

Then he went on to discuss a number of new projects they're working on. Slides are here.

Where ranking journals has a certain level of controversy, they did continue on to rank authors (ew?), and most recently articles which required some special steps.

Cooler, I think was the next work discussed.  A mapping technique for reducing a busy graph to find patterns. "Good maps simplify and highlight relevant structures." Their method did well when compared to other method and made it possible to compare over years. Nice graphic showing the emergence of neuroscience. They then did a hierarchical version. Also pretty cool. I'd have to see this in more detail, but looks like a better option than the pruning and path methods I've seen to do similar things. So this hierarchical map thing is now being used as a recommendation engine.  See babe' . I'll have to test it out to see.

Then (it was a very full talk) women vs. men. Men self-cite more. Means they have higher h-index.
Jacob Foster UCLA (Sociology)

If the last talk seemed packed. This was like whoa. He talked really, really fast and did not slow down. The content was pretty heavy duty, too. It could be that the remainder of the room basically knew it all so it was all review. I have read all the standard STS stuff, but it was fast.

He defines science as "the social production of collective intelligence."

Rumsfeld unknown unknowns... he's more interested in unknown knowns. (what do you know but do not know you know... you know? 🙂 )

Ecological rationality - rationality of choices depends on context vs rational choice theory which is just based on rules, not context.

Think of scientists as ants. Complex sociotechnical system. Information processing problem, using Marr's Levels.

  • computational level: what does the system do (e.g.: what problems does it solve or overcome) and similarly, why does it do these things
  • algorithmic/representational level: how does the system do what it does, specifically, what representations does it use and what processes does it employ to build and manipulate the representations
  • implementational/physical level: how is the system physically realised (in the case of biological vision, what neural structures and neuronal activities implement the visual system)

Apparently less studied in humans is the representational to hardware. ... ? (I have really, really bad handwriting.)

science leverages and tunes basic information processing (?).. cluster attention.

(incidentally totally weird Google Scholar doesn't know about "american sociological review" ? or ASR? ended up browsing)
Foster,J.G., Rzhetsky,A., Evans, J.A. (2015) Tradition and Innovation in Scientists’ Research Strategies. ASR 80, 875-908. doi: 10.1177/0003122415601618

Scientists try various strategies to optimize between tradition (more likely to be accepted) and innovation (bigger pay offs). More innovative papers get more citations but conservative efforts are rewarded with more cumulative citations.

Rzhetsky,A.,Foster,I.T., Foster,J.G.,  Evans, J.A (2015) Choosing experiments to accelerate collective discovery. PNAS 112, 14569–14574. doi: 10.1073/pnas.1509757112

This article looked at chemicals in pubmed. Innovative was new ones. Traditional was in the neighborhood of old ones. They found that scientists spend a lot of time in the neighborhood of established important ones where they could advance science better by looking elsewhere. (hmmm, but... hm.)

The next bit of work I didn't get a citation for - not even enough to search - but they looked at JSTOR and word overlap. Probabilistic distribution of terms. Joint probability. (maybe this article? pdf). It looked at linguistic similarity (maybe?) and then export/import of citations. So ecology kept to itself while social sciences were integrated. I asked about how different social sciences fields use the same word with vastly different meanings - mentioned Fleck. He responded that it was true but often there is productive ambiguity of new field misusing or misinterpreting another field's concept (e.g., capital). I'm probably less convinced about this one, but would need to read further.

Panel 1: Scientific Establishment

  • George Santangelo - NIH portfolio management. Meh.
  • Maryann Feldman - geography and Research Triangle Park
  • Iulia Georgescu, Veronique Kiermer,Valda Vinson - publishers who, apparently, want what might already be available? Who are unwilling (except PLOS) or unable to quid pro quo share data/information in return for things. Who are skeptical (except for PLOS) that anything could be done differently? that's my take. Maybe others in the room found it more useful.

Nitesh Chawla University of Notre Dame

(scanty notes here - not feedback on the talk)

Worked with Arnet Miner data to predict h-indices.


It turns out, that according to them, venue is key. So all of the articles that found poor correlation between JIF and an individual paper's likelihood of being cited.. they say actually a pretty good predictor when combined with researcher's authority. Yuck!

Janet Vertesi Princeton University

Perked up when I realized who she is - she's the one who studied the Rover teams! Her book is Seeing Like a Rover.  Her dissertation is also available online, but everyone should probably go buy the book.  She looked at more a meso level of knowledge, really interested in teams. She found that different teams - even teams with overlapping membership - managed knowledge differently. The way instrument time (or really spacecraft maneuvering so you can use your instrument time) was handled was very different. A lot had to do with the move in the '90s for faster...better... cheaper (example MESSENGER). She used co-authoring networks in ADS and did community detection. Co-authorship shows team membership as same casts of characters writing. This field is very different from others as publications are in mind while the instruments are being designed.

She compared Discovery class missions - Mars Exploration Rover - collectivist, integrated; everyone must give a go ahead for decisions; Messenger - design system working groups (oh my handwriting!)

vs. Flagship - Cassini - hierarchical, separated. Divided up sections of spacecraft. Conflict and competition. Used WWII as a metaphor (?!!). No sharing even among subteams before release.  Clusters are related to team/instrument.

New PI working to merge across - this did show in evolution of network to a certain extent.

Galileo is another flagship example. breaks down into separate clusters. not coordinated.

Organization of teams matters.

I admitted my fan girl situation and asked about the engineers. She only worked with scientists because she's a foreign national (may not mean anything to my readers who aren't in this world but others will be nodding their heads).  She is on a team for an upcoming mission so will see more then. She also has a doctoral student who is a citizen who may branch off and study some of these things.
Ying Ding Indiana University

She really ran out of time in the end. I was interested in her presentation but she flew past the meaty parts.

Ignite Talks (15s per slide 2min overall or similar)

  • Filippo Menczer - - tool to view more information about authors and their networks. Browser extension.
  • Caleb Smith,
  • Orion Penner - many of us were absolutely transfixed that he dropped his note pages on the floor as he finished. It was late in the day!  He has a few articles on predicting future impact (example). On the floor.
  • Charles Ayoubi,
  • Michael Rose,
  • Jevin West,
  • Jeff Alstott - awesome timing, left 15 for a question and 15 for its answer. Audience didn't play along.

Lee Giles Penn State University

It was good to save his talk for last. A lot going on besides keeping CiteSeer up and running. They do make their data and their algorithms freely available (see: ) . This includes extracting references. They also are happy to add in new algorithms that make improvements and work in their system. They accept any kind of document that works in their parsers so typically journal articles and conference papers.

RefSeer - recommends cites you should add

TableSeer - extracts tables (didn't mention and there wasn't time to ask... he talked a lot about this for chemistry... I hope he's working with the British team doing the same?)

Also has things to extract formulas, plots, and equations. Acknowledgements. Recommend collaborators (0 for me, sniff.) See his site for links.



2 responses so far

Slides from Leveraging Data to Lead

Nov 20 2015 Published by under bibliometrics, Conferences, libraries

This was a great conference put on by Maryland SLA. I tweeted at bit using the hashtag: #datamdsla

Here's my slides. Not awesome but I did find some nice pictures 🙂


One response so far

Post I wish I had time to write: Scientific meetings and motherhood

Feb 24 2015 Published by under Conferences, scholarly communication

I was reading Potnia's new post on meetings - why to go to them - and nodding my head vigorously (ouch) and connecting that to the part of the dissertation I'm writing now on tweeting meetings and the research over the years on how scientific meetings work and contribute...

and I got very sad. I'm a real extrovert and a magpie of all sorts of different kinds of research, but I can't justify spending my limited time reading articles that aren't pretty directly relevant to my job or my dissertation. When I went to bunches of meetings, I could soak a million little tidbits up, meet the people doing the work, browse lots of posters and talk to their authors. It's really a very efficient way to see what's up with a field.

and now... I haven't been to a conference since I was in my first trimester with my twins 🙁   Sure, I've listened in to some webinars and followed some tweets. It's not enough.

Would childcare at a venue help?  I don't know... I'd still have to get them there, I'd have to trust the childcare (what if I got there and checked them out and didn't like what I saw?), and I'm paying for childcare at home even when I go and money is super tight now with my income being the only one in our household for more than a year.  I thought about bringing my sister along and then we could see the sights together outside of hours. My work would pay my travel and my room and so I'd just have to pay her travel and everyone's food. But I can't really even swing that right now....


So yeah... at least there's twitter. The post I'd like to write actually cites references and what not.

And I'm only the 10 millionth person to have this issue this year so I  know I'm not a special snowflake but that doesn't mean I can't still bitch about it.

4 responses so far

ASIST2012: Metrics 2012

Oct 31 2012 Published by under bibliometrics, Conferences

I attended two days of the ASIST annual meeting. I'm actually quite bummed because I was sure I wasn't going to get to any conferences for a long time because of the twins, but this one was local so I thought I could go. Unfortunately, the superstorm Sandy shut down daycare and also caused Baltimore to shut the streets down 🙁   I did make it in for a workshop on Friday and most of the day on Sunday. Lesson learned - ASIST is more than happy to arrange a room for any mothers who need to pump, but you do need to ask in advance.

Metrics 2012: Symposium on Informetric and Scientometric Research

This is the second time this was held. It's a nice small group that discusses works in progress and recently completed work.

Kate McCain presented on "Assessing obliteration by incorporation" - I've been trying to remember the name for this phenomenon. This is when some concept becomes so ingrained in the field the original article is no longer cited - there's no citation for when the concept is mentioned. A similar idea is "palimpsestic syndrome" - that's when a newer article or review article is cited for a concept instead of the original source because that's where the person read about it and they're unaware of the originator of the idea. The way to find bibliometric evidence is to look for noun phrases with references. In the past this has been done 3 ways:

  • using only the metadata and citations from WoS (record level searching)
  • using the metadata and citations but then pulling the f/t and reviewing that (record level searching and text analysis)
  • now using the f/t

The problem with the first two ways is that you miss a lot of things that use the concept but not in the metadata, only somewhere down in the full text.  She looked for "bounded rationality" using JSTOR's collection - Drexel's subscription. This is somewhat limiting because they only have some JSTOR collections and the coverage of psychology is not good.

Dietmar Wolfram talked about journal similarity by incoming citing journal topicality. He did this for LIS because it annoys us all that the MIS and health informatics journals top the category in JCR - when they really maybe should be in another category. This seemed to work except for small journals or orphan journals (ones that are not cited).

Other things of interest:

  • lots of use of gephi
  • lots of student interest
  • open position at Drexel if anyone is interested

Comments are off for this post

ASIST2011: Post-Conference Symposium on Informetric and Scientometric Research

Oct 14 2011 Published by under Conferences

I attended almost all of this symposium – unfortunately, I had to leave at 2:45 to get to my flight. I guess it probably ended early anyway, because certainly two of the speakers in the last part didn’t show.

I wish I had a copy of the slides –maybe one will be provided later. The talks were mostly early summaries of work in progress, with little methodological detail.

Kate McCain provided additional detail on her location of the core journals in health informatics. Her analysis included picking out themes within health informatics.

Stasa Milojevic looked at the whole field of LIS from 1955- to look at citation and recitation practices.

Bei Wen talked about triangulating journal, paper, individual bibliometrics to better understand the field of water research… I found this incredibly confusing.

Kun Lu compared two methods of looking at author relatedness. He brought in information retrieval methods like vector space modeling and latent Dirichelet allocation. The problem with using ACA for author relatedness is when there aren’t a ton of citations to use. They found that the topic model worked fairly well – once again, difficult to get enough details from the presentation so hopefully an article will be forthcoming

Dangzhi Zhao extended her earlier work looking at all author co-citation analysis to look at author bibliographic coupling. Author selection is very important but once you do that, first/last/all author bib coupling is great for an overview.

Chaoqun Ni spoke very quickly about research diversity and intensity using LIS research.

Judit Bar-Ilan did a study of the tag bibliometrics in CiteULike and Mendeley. Seems like there are really some problems with getting good data from both of these services. She didn’t use the fairly new Mendeley API, but she found that some of the searches mentioned in the help didn’t work (I think the main one was searching for tag: ). The other thing is that she didn’t search on a journal or on free text nor did she expand the query to other related terms.

Jason Priem talked about his most recent work with Heather Piwowar and Brad Hemminger. The abstract has a lot more detail and is online here:


As for posters, Jason and Kaitlin Costello’s poster was already shared on read/write web so it probably had more mileage than anything else from this conference. It’s at

Comments are off for this post

ASIST2011: Miscellaneous sessions

Oct 14 2011 Published by under Conferences

I’m reconstructing these a couple of days later as I just wasn’t able to really live blog this conference.

Tenure and Promotion in the Age of Online Social Media

Anatoliy Gruzd presented.

This is certainly a question we all ask: to what extent and how does social media impact promotion and tenure? They surveyed and interviewed researchers at ASIST and AOIR about this. I tweeted some notes. Seems like a lot of people agree with me that it all depends and should be on a case by case basis. Some scholars are using their social media to talk about their work or popularize it whereas others are using it for personal reasons and would not want that information to count toward their tenure.

Analytic Potential of Scientific Data

Carole Palmer presented.

She talked about cataloging books for the potential uses; that is, asking what searches should this come up under? what information needs could this satisfy? For this she cited Hjorland in 1997, but clearly it wasn’t new when Soergel wrote about it in his 1985 book! Anyhoo, her point was that data should be cataloged this same way. Librarians can work with data producers and data consumers to get an idea of what other groups might find data useful.

Using Information Obtained through Informetrics to Address Practical Problems and Aid Decision-Making

I have my own answers to the above, clearly, because in my day job I do apply informetrics to real world issues (and not promotion and tenure!). The speakers generally gave an overview of their recent work and some of that was for government or industry. Besides such things as evaluating institutions, groups, and individuals, they mentioned understanding the sub-areas of a field to design an academic program, evaluating journals for selection in a library, and looking for collaboration partners.


Personal Information Management (session)

The speaker everyone wanted to see – the one about duplication – wasn’t there 🙁  The second speaker just gave a tutorial on survey design. I have no idea how this made it through review when so few papers were selected. The third group of speakers had an interesting piece on the PIM of teachers. That should be useful for helping to design systems for them.

Comments are off for this post

Older posts »