Archive for the 'Information Science' category

Bots, Mixed Initiative, and Virtual Personal Assistants

I've been trying to write this post for a while but am finally just throwing my hands up about having an well-done oeuvre to just get the thing done.

When I saw Daniel Tunkelang's brief post on virtual assistants I was like, oh, that again. But there were some links and doing my usual syntopic reading I fell into the rabbit hole a bit.

Used to be that computer science was like "automate all the things." More automated, more better. Bates (1990) was all like wait a minute here, there are some things it makes sense to hand off and others it makes sense for the human to do. People do some things faster. People learn and explore and think by doing.  People need to control certain things in their environment. But other things are a hassle or can be easily done by a computer. What you don't want to do is to make the effort of supervising the automation so arduous that you're trading one hassle for another.

For quite a few years, there has been an area of research called "mixed initiative" that looks specifically at things like virtual assistants and automating where it makes sense without overburdening the user. As I was dabbling in this area a couple of years ago, I read some articles. It seemed weird to me, though, because I think most knowledge workers my age or younger probably don't know how to work with a living human assistant. I have never worked anywhere with a secretary who offloaded work from me. Never worked somewhere with someone to help me schedule meetings, type out correspondence, format articles, do my travel stuff, etc. I have been on teams with deliverables that were sent through an editor - but that was like a special technical writer. I suppose I would have to negotiate with an assistant what I would want him or her to do and then accept (within boundaries) that they might do things differently than I do. I would have to train them. Should I expect more of a virtual assistant?

All of this is in the back of my head when I started following the links.

So what do they mean by virtual assistants - they're hot, but what are they doing and do they work?

Scheduling meetings

  • Meekan is, apparently, a bot that takes an informal request within Slack and negotiates with other calendars to make an appointment.
  • is similar but you cc Amy (a bot, but I like that she has a name), and she takes on the negotiation for you.

Project/Team Management (loosely construed)

  • Howdy will get feedback from team members and also take lunch orders. Seems sort of like some things I saw baked into Basecamp when I saw a demo. It's in Slack, too.
  • Awesome helps manage teams on Slack.


Travel, Shopping, ...

  • Assist does a few different things like travel and shopping.

General but often operating a device

  • Siri
  • Cortana
  • Amazon Alexa
  • Google Now (sorta)
  • Facebook M

A lot of us don't want to talk to our assistant, but to text them. One of the articles pointed to this.


When I talked to engineers back in the day about their personal information management, there were a lot of things they were doing themselves that it just seemed like they should be able to offload to someone who is paid less (Pikas, 2007). Likewise, I was talking to a very senior scientist who was spending hours trying to get his publications to be right on the external site. Even though statements are routinely made to the contrary, it seems like work is pushed off from overhead/enterprise/admin to the actual mission people - the scientists and engineers - in an attempt to lower overhead. It pushes money around, sure, but it doesn't solve the goal. So here's an idea, if we really, really, really aren't going to bring back more overhead/enterprise/admin folks, are there bots we can build in to our systems to ease the load?

If Slackbot watches you and asks you personal questions: isn't that cute. If Microsoft does: evil, die, kill with fire. If your employer does: yuck?



Bates, M. J. (1990). Where should the person stop and the information search interface start. Information Processing & Management, 26(5), 575-591. doi:10.1016/0306-4573(90)90103-9

Pikas, C. K. (2007). Personal Information Management Strategies and Tactics used by Senior Engineers. Proceedings of the Annual Meeting of the American Society for Information Science and Technology, Milwaukee, WI. , 44 paper 14.

Comments are off for this post

Ebook Explosion

Dec 17 2014 Published by under Information Science, libraries, Uncategorized

Seems like all the publishers and all the societies are trying to get into the eBook game. The newest announcement is from AAS (using IOP as a publisher). Considering the fact that a lot of these domains are not particularly known for monographs - like Computer Science and ACM's new ebook line - but instead for conference proceedings and journal articles, seems kinda weird.

Someone mentioned that maybe it was due to the ebook aggregator demand driven acquisition plans - but I think it's just the opposite. Many major publishers have jacked up prices (pdf) on EBL and Ebrary recently - all to push libraries in to licensing "big deal" bundles of the entire front list or entire subject categories. And it is super attractive to buy from the publishers because they're often without DRM, PDFs (one big publisher even offers a whole book in a single pdf, most are one pdf per chapter), ways to view online, easily findable using Google and also nice MARC records for adding to the catalog.

The ebook aggregators have nasty DRM. They have concurrent user rules. They have special rules for things that are considered textbooks.  We have to login with our enterprise login (which isn't my lab's day-to-day login) and the data about what books we view is tied to our identities. The new prices end up being as much as 30-40% of the cover price for a 1 day loan. That's right. The customer can look and maybe print a couple of pages for 24 hours and the library is charged a third the cover price of the book.

But for the society and publisher own pages, what seems like a one time purchase has now become yet another subscription. If you buy the 2014 front list will you not feel the pressure to buy the 2015 and 2016 publications?

Aggregators had seemed like some of the answer, but not so much with these prices. We've already mugged all the other budgets for our journal habit so where do these new things come from? The print budget was gone ages ago. Reference budget was also raided.  The ones we've licensed do get used a lot at MPOW.

Comments are off for this post

Continuing value and viability of specialized research databases

Nov 26 2014 Published by under finding information, Information Science

There was an interesting thread yesterday on the PAMnet listserv regarding "core" databases in Mathematics and which could be cut to save money.

One response was that it's better to search full text anyway (I couldn't disagree more).

Ben Wagner expressed concern that Google Scholar was going to drive all of the databases out of business and then Google would abandon the project.

Joe Hourclé posted about ADS - a core database in astro. Fred Stoss posted about PubMed - needs no intro here, surely!

Here's my response.

I think Scopus and WoS are the biggest immediate threats to the smaller domain specific indexes particularly when the largest number of academic users are looking for a few reasonable things and aren't doing the complex queries or needing to be very precise and have very high recall. In my world, I'm like the goalie: by the time they ask me, they've tried Google, they've asked their friends, they've asked their mother*... it's gotten past 10 people without an adequate answer. For these hard questions, I need the power of a good database (like Inspec). But... if you look at quantities and numbers of users... does that justify the huge cost? Maybe? But do our auditors agree? Infrequent big wins vs. day to day common usage?

As Ben has often chronicled, we've shifted money out of every other budget to support our sci/tech journal habit. We've starved the humanities. We've dropped databases. All for more and more expensive journals. Seems like if the content does get paid for out of other budgets via page charges or institutional support for open access publishing, that might make it even more important that libraries have better ways to find the distributed content. But, like Ben, I worry that we'll put these finding tools out of business.

Another observation: two of the "core" databases mentioned, ADS and PubMed, are government supported as a service to the community. The solar physics bibliography is a very specialized resource but is also super important to those researchers. Maybe if building specialty research databases is no longer profitable but there remains a need, the community-built tools will improve/grow/gain support? Maybe they'll be backwards and using technology from 1995, though 🙂

I'm working with some projects that are actually taking big piles of full text documents and using computational methods to classify using an ontology that's built by subject matter experts (with some advice from a professional taxonomist in my group). The volume/velocity/yadda yadda of the data precludes the careful indexing done by our fancy databases... but this and other projects like it I think show a swing back toward the importance of good indexing and the importance of having domain experts reviewing the classification system.


* My mom is a statistician so I might ask her first


Comments are off for this post

Government cost recovery gone awry: PACER and NTIS

Aug 27 2014 Published by under information policy, Information Science

(reiterating these are just my personal opinion and do not reflect anything from my place of work - if you know what that is - or anything else)

For many years, the US federal government has tried to cut costs by outsourcing anything that isn't inherently governmental, making sure that government doesn't compete with industry, and requiring cost recovery for government agencies that provide services to other agencies (see A-76 ).

Old examples that might have changed: GPO had to do all printing of history books for military historians, but the quality was ok, the distribution was crap, and the DoD history organizations and readers had to pay a lot of money. So what they did when I worked there was to give the book to a university press that would do a decent job with it. The books were not copyrightable anyway because they were work for hire by a government employee. Everyone was happy. Another old example was that Navy was required to send all records to NARA. But then Navy all the sudden had to pay NARA to keep the documents (I think this has changed - my example is from late 1990s). This was things like deck logs. Hugely important documents.

NTIS has long been caught up in this. Agencies producing technical reports are required by law to send them to NTIS (if they are unlimited distribution). NTIS is required to recover the cost of their administration and archiving by selling the documents. This is hard because first, agencies are not thorough in sending stuff to NTIS (often because their central repository isn't even getting copies - even though required by regulations, instructions, etc.) and second, agencies make these documents available for free from their own sites.  NTIS also has picked up a few bucks here and there doing web and database consulting and licensing their abstracting and indexing database to vendors who resell to libraries. Why pay for it from a third-party vendor? Cross search with your favorite engineering database. Better search tools.

PACER is also caught up in this. There's actually a law that says US Courts has to recover the cost of running the system by charging for access or for documents. They do not want to but there is a law that they must obey.  This is information that really should be freely available and easily accessible. A famous activist tried to download the whole thing and make available, but he was stopped.

The results of forcing these agencies - GPO, NTIS, US Courts - to recover their costs are great and they directly work against the open government we need and deserve. It causes the agencies to cut corners and not have the systems they need. It causes customer agencies and citizens alike to distrust and dislike them.

Now, US Courts has removed large collections of historical documents from PACER because of an IT upgrade. Read the Washington Post article. Various people in Congress are trying to shut NTIS down, again. GPO seems to be ok, for now - lots of cool neat things from them.

Libraries  - like mine - have been burdened by cost recovery, too, and it often signals the beginning of the end. Superficially, makes sense to show how much something is valued and by whom. In practice, you need a lot more accounting systems and controls over the professional workers that prevent them from doing their job. These services are directly in support of strategic requirements (open government and accountability) but are infrastructure. People are blind to infrastructure until it's no longer there.  NTIS, PACER, GPO and others need to stop with this cost recovery business (meaning Congress has to pass a law that removes that requirement) and be funded as infrastructure. Outsource to get needed skills you can't hire in government, but be smart about it.

Comments are off for this post

Fragile knowledge of coding and software craftsmanship

Aug 11 2014 Published by under Information Science

To continue the ongoing discussion, I think my concerns and experiences with informal education in coding (and perhaps formal education offered to those not being groomed into being software engineers or developers) fall into two piles: fragile knowledge and craftsmanship.

Fragile knowledge.

Feynman (yes, I know) described fragile knowledge of physics as learning by rote and by being able to work problems directly from a textbook but not having a deeper understanding of the science that enables application in conditions that vary even slightly from the taught conditions. I know my knowledge of physics was fragile - I was the poster child of being able to pass tests without fully understanding what was going on. I didn't know how to learn. I didn't know how to go about it any other way. I had always just shown up for class, done was I was asked, and been successful. In calculus I was in a class that had discussion sections in which we worked problems in small groups - is this why my knowledge isn't fragile in that or is it that I did have to apply math to physics problems? Who knows.

Looking back, now, it seems like a lot of the informal education I've seen for how to code is almost intentionally aimed at developing fragile knowledge. It's not how to solve problems with code and building a toolkit that has wide application. Showing lots of examples from different programs. It's more like list the n types of data.



There is actually a movement with this name and I didn't take the time to read enough about it to know if it matches my thoughts. Here I'm talking coding environment, code quality, reproducibility, sharing.... Not only solving the problem, but doing it in a way that is efficient, clean, and doesn't open up any large issues (memory leaks, openings for hackers, whatever else). Then taking that solution and making it so that you can figure out what it did in a week or so or so that you could share with someone else who could see what it did. Solving the problem so that you can solve the same problem with new data the same way. My code is an embarrassment - but I'm still sharing, because it's the best I know how to do and at least there's something to start with.

A couple of people suggested the Software Carpentry classes - they sound great. Maybe SLA or ASIST or another professional association could host one of these as a pre-conference workshop? Maybe local (US - Maryland - DC ) librarian groups could host one?  We could probably get enough people.

One response so far

My current to-read list

Jun 27 2014 Published by under Information Science

I've been keeping a million tabs open at work and a home, because I haven't even had the time to add things to my citation manager... I also have some things in print that I've been carrying back and forth to work every day (boy is my bag heavy!).  Most of these things probably rate a post of their own, but sigh...  Add to that my obsession du jour with screenscraping and text mining using R, Python, and Perl.... and the fact that I'm not good at it so everything takes longer (also would take less time if I actually RTFM instead of just hopped to code and tried it).

So here are some things on my radar (I'm giving no credit to whoever pointed me to these because I honestly don't remember! Sorry):

  • Hadas Shema,  Judit Bar-Ilan,  Mike Thelwall (in press) How is research blogged? A content analysis approach. JASIST. DOI: 10.1002/asi.23239
    She tweeted a link to the pre-print if you don't have access. I got a about 2/3 through this as soon as I saw it announced and then realized I had been working on a very important work thing and dropped it. Very interesting so far.
  • Lisa Federer (2014) Exploring New Roles for Librarians: The Research Informationist.Synthesis Lectures on Emerging Trends in Librarianship. New York: Morgan and Claypool. doi:10.2200/S00571ED1V01Y201403ETL001
    I was first like meh about this (another name) but then I relooked and I'm interested in their version of embedding.
  • Vanessa P. Dennen. (2014) Becoming a blogger: Trajectories, norms, and activities in a community of practice. Computers in Human Behavior 36, 350-358, doi: 10.1016/j.chb.2014.03.028
  • Paige Brown (11 June 2014) How Academics are Using Social Media. From the Lab Bench.
    This and all the linked reports look very interesting.
  • Pablo Moriano,Emilio Ferrara,Alessandro Flammini,Filippo Menczer (2014). Dissemination of scholarly literature in social media.
  • Jeff Seaman and Hester Tinti-Kane (2013) SOCIAL MEDIA FOR TEACHING AND LEARNING. Boston: Pearson Learning.
    This was probably cited in the blog post above.
  • Liu, Y., Kliman-Silver,C.,Mislove,A. (2014) The tweets they are a-changin': Evolution of Twitter Users and Behavior. ICWSM. (google for it - I have the printout)
    This was mentioned by some folks from MPOW who went to the conference. Provides a nice overview.
  • Tenopir, C, Volentine,R., King, DW, (2013) Social Media and scholarly reading. Online Information Review 37, 193-216. doi: 10.1108/oir-04-2012-0062
    I might have actually read this but it's still riding around in my bag
  • Nentwich, M., König, R.. (2012). Cyberscience 2.0: Research in the age of digital social networks. Frankfurt: Campus Verlag.
    This one is time sensitive as I borrowed it from Columbia.
  • Holmberg, K. Thelwall, M(2013) Disciplinary differences in twitter scholarly communication. ISSI Proceedings 2013.  <- that was typed from my handwriting and not checked. google for it. I think I may have read this, but i have it in the stack to read again
  • Thelwall et al (in press) Tweeting links to academic articles. Cybermetrics J (google for preprint)
  • Haustein, et al. Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature. ArXiv 1308.1838
  •  Sayes,E. (2014) Actor–Network Theory and methodology: Just what does it mean to say that nonhumans have agency? Social Studies of Science 44, 134-149.  doi:10.1177/0306312713511867

And this is just on my screen or in my bag. I think the babies tore up 3 articles i had waiting to be read by my couch 🙁  So far behind!


Comments are off for this post

Auditors, really?

May 13 2014 Published by under Information Science

I'm all for responsible use of our money and we are very, very careful with our limited funds... so really I would have said, "bring it on" to auditors. Of course, I'm not the financial person, so I don't have to deal with them directly.

So here's the crazy bit: they want us to justify each and every one of our subscriptions for 2006 and 2007. What did we pay? What was the going rate? How many downloads were there for ONLY our lab?  Were there cheaper substitutes? Problem is, my place of work is a research lab that's a division of a major university. Only a few of our licenses are only for our lab. The vast majority are managed by the larger institution - as is proper.

In case you didn't know, usage is hardly ever available by IP. Counter compliant statistics, I'm told, are actually less informative than what was available previously.

So what happens if they disallow our licenses from 2007 in 2014? No clue whatsoever.

The only supporting info I could think to offer beyond what everyone else has done (the tech services folks at the parent institution have been awesome), is what sources we cited in those years. To me, that demonstrates that they were useful/used - even if the year is somewhat delayed.

I was pondering approaches, but I was able to do it in a few clicks in Scopus - not super clean, but useful:

  1. Search for affiliation and pick yours off.
  2. Limit by publication year (2007 in this case)
  3. Select all (we only had ~450/year during this time, ymmv if you have a much larger institution)
  4. Under More > view References
  5. In the refine sidebar, at the bottom, in small print > export refine

So I did see things written years after they were cited, but they were mostly things like standards and technical reports, and unpublished things. The publications also needed a little cleaning up  - aj and apj were there as well as the journals listed under their full names.

Dunno. I guess we'll see if the auditors get this - I'm getting the feeling that they have no concept of how science works.

Comments are off for this post

Random observation: Expanding database scope can sometimes be annoying

Apr 28 2014 Published by under Information Science

Observation, rant, whatever.

I sorta look like an idiot - more than usual, anyway - because I can't pinpoint the number of articles MPOW wrote each year from 2009-2013. I mean, I give a number and then a couple of weeks later they ask me to check the research databases again (I've probably mentioned that we didn't have a mandatory comprehensive tracking system internally until this year) and the number has changed.

In the last 4 weeks, ~77 articles were added to this one set of databases we have with our name in the address field for any author. Of those, 45 were published in 2013 or 2014.

They were all from conferences that should be covered by the database... so I guess I don't really have a solution. I'm comparing our research output with other organizations, though, so if a new organization is added, it's hardly fair to do them now and use last month's numbers for MPOW. SIGH.

2 responses so far

C&RL Vote for articles - write in campaign for Taylor!

Apr 22 2014 Published by under Information Science

Ok, so the only thing I'm doing for this campaign is posting here, but anyway.

College & Research Libraries is having a 75th anniversary special issue and they're asking readers to vote for the best articles: .

I don't know about a lot of the choices. I mean FW Lancaster (framed pdf) FTW, of course! And Kilgour, probably. Plus the anxiety one by Mellon (framed pdf) has definitely had an impact.

BUT they forgot the best one ever:

Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & Research Libraries, 29(3), 178-194. (pdf)

Luckily there's a write in block. So you know what to do... write it in!


Comments are off for this post

Post I would like to write: New trend for linking external and internal information

I first noticed this in chemistry but now I'm seeing it in engineering, too. Major publishers (?), content vendors, indexes (?) are starting to offer services whereby your company can use their search tool to index your local content and display it side by side with their content OR a way to use their api to pull their content in to your local tool.

That's a common complaint in research labs and in companies. Knowledge management of internal information gets funded and defunded and is cool and then not cool... External information is familiar to people coming out of school... how can you search across both.

We have the artist formerly known as Vivissimo (now IBM something spherical I think) as an intranet search, and I would love to see it use our discovery layer type search as a tab. I don't see why it couldn't.

This deserves analysis and thought - no time. sorry!

2 responses so far

« Newer posts Older posts »