Archive for the 'Information Science' category

Ebook Explosion

Dec 17 2014 Published by under Information Science, libraries, Uncategorized

Seems like all the publishers and all the societies are trying to get into the eBook game. The newest announcement is from AAS (using IOP as a publisher). Considering the fact that a lot of these domains are not particularly known for monographs - like Computer Science and ACM's new ebook line - but instead for conference proceedings and journal articles, seems kinda weird.

Someone mentioned that maybe it was due to the ebook aggregator demand driven acquisition plans - but I think it's just the opposite. Many major publishers have jacked up prices (pdf) on EBL and Ebrary recently - all to push libraries in to licensing "big deal" bundles of the entire front list or entire subject categories. And it is super attractive to buy from the publishers because they're often without DRM, PDFs (one big publisher even offers a whole book in a single pdf, most are one pdf per chapter), ways to view online, easily findable using Google and also nice MARC records for adding to the catalog.

The ebook aggregators have nasty DRM. They have concurrent user rules. They have special rules for things that are considered textbooks.  We have to login with our enterprise login (which isn't my lab's day-to-day login) and the data about what books we view is tied to our identities. The new prices end up being as much as 30-40% of the cover price for a 1 day loan. That's right. The customer can look and maybe print a couple of pages for 24 hours and the library is charged a third the cover price of the book.

But for the society and publisher own pages, what seems like a one time purchase has now become yet another subscription. If you buy the 2014 front list will you not feel the pressure to buy the 2015 and 2016 publications?

Aggregators had seemed like some of the answer, but not so much with these prices. We've already mugged all the other budgets for our journal habit so where do these new things come from? The print budget was gone ages ago. Reference budget was also raided.  The ones we've licensed do get used a lot at MPOW.

Comments are off for this post

Continuing value and viability of specialized research databases

Nov 26 2014 Published by under finding information, Information Science

There was an interesting thread yesterday on the PAMnet listserv regarding "core" databases in Mathematics and which could be cut to save money.

One response was that it's better to search full text anyway (I couldn't disagree more).

Ben Wagner expressed concern that Google Scholar was going to drive all of the databases out of business and then Google would abandon the project.

Joe Hourclé posted about ADS - a core database in astro. Fred Stoss posted about PubMed - needs no intro here, surely!

Here's my response.

I think Scopus and WoS are the biggest immediate threats to the smaller domain specific indexes particularly when the largest number of academic users are looking for a few reasonable things and aren't doing the complex queries or needing to be very precise and have very high recall. In my world, I'm like the goalie: by the time they ask me, they've tried Google, they've asked their friends, they've asked their mother*... it's gotten past 10 people without an adequate answer. For these hard questions, I need the power of a good database (like Inspec). But... if you look at quantities and numbers of users... does that justify the huge cost? Maybe? But do our auditors agree? Infrequent big wins vs. day to day common usage?

As Ben has often chronicled, we've shifted money out of every other budget to support our sci/tech journal habit. We've starved the humanities. We've dropped databases. All for more and more expensive journals. Seems like if the content does get paid for out of other budgets via page charges or institutional support for open access publishing, that might make it even more important that libraries have better ways to find the distributed content. But, like Ben, I worry that we'll put these finding tools out of business.

Another observation: two of the "core" databases mentioned, ADS and PubMed, are government supported as a service to the community. The solar physics bibliography is a very specialized resource but is also super important to those researchers. Maybe if building specialty research databases is no longer profitable but there remains a need, the community-built tools will improve/grow/gain support? Maybe they'll be backwards and using technology from 1995, though :)

I'm working with some projects that are actually taking big piles of full text documents and using computational methods to classify using an ontology that's built by subject matter experts (with some advice from a professional taxonomist in my group). The volume/velocity/yadda yadda of the data precludes the careful indexing done by our fancy databases... but this and other projects like it I think show a swing back toward the importance of good indexing and the importance of having domain experts reviewing the classification system.


* My mom is a statistician so I might ask her first


Comments are off for this post

Government cost recovery gone awry: PACER and NTIS

Aug 27 2014 Published by under information policy, Information Science

(reiterating these are just my personal opinion and do not reflect anything from my place of work - if you know what that is - or anything else)

For many years, the US federal government has tried to cut costs by outsourcing anything that isn't inherently governmental, making sure that government doesn't compete with industry, and requiring cost recovery for government agencies that provide services to other agencies (see A-76 ).

Old examples that might have changed: GPO had to do all printing of history books for military historians, but the quality was ok, the distribution was crap, and the DoD history organizations and readers had to pay a lot of money. So what they did when I worked there was to give the book to a university press that would do a decent job with it. The books were not copyrightable anyway because they were work for hire by a government employee. Everyone was happy. Another old example was that Navy was required to send all records to NARA. But then Navy all the sudden had to pay NARA to keep the documents (I think this has changed - my example is from late 1990s). This was things like deck logs. Hugely important documents.

NTIS has long been caught up in this. Agencies producing technical reports are required by law to send them to NTIS (if they are unlimited distribution). NTIS is required to recover the cost of their administration and archiving by selling the documents. This is hard because first, agencies are not thorough in sending stuff to NTIS (often because their central repository isn't even getting copies - even though required by regulations, instructions, etc.) and second, agencies make these documents available for free from their own sites.  NTIS also has picked up a few bucks here and there doing web and database consulting and licensing their abstracting and indexing database to vendors who resell to libraries. Why pay for it from a third-party vendor? Cross search with your favorite engineering database. Better search tools.

PACER is also caught up in this. There's actually a law that says US Courts has to recover the cost of running the system by charging for access or for documents. They do not want to but there is a law that they must obey.  This is information that really should be freely available and easily accessible. A famous activist tried to download the whole thing and make available, but he was stopped.

The results of forcing these agencies - GPO, NTIS, US Courts - to recover their costs are great and they directly work against the open government we need and deserve. It causes the agencies to cut corners and not have the systems they need. It causes customer agencies and citizens alike to distrust and dislike them.

Now, US Courts has removed large collections of historical documents from PACER because of an IT upgrade. Read the Washington Post article. Various people in Congress are trying to shut NTIS down, again. GPO seems to be ok, for now - lots of cool neat things from them.

Libraries  - like mine - have been burdened by cost recovery, too, and it often signals the beginning of the end. Superficially, makes sense to show how much something is valued and by whom. In practice, you need a lot more accounting systems and controls over the professional workers that prevent them from doing their job. These services are directly in support of strategic requirements (open government and accountability) but are infrastructure. People are blind to infrastructure until it's no longer there.  NTIS, PACER, GPO and others need to stop with this cost recovery business (meaning Congress has to pass a law that removes that requirement) and be funded as infrastructure. Outsource to get needed skills you can't hire in government, but be smart about it.

Comments are off for this post

Fragile knowledge of coding and software craftsmanship

Aug 11 2014 Published by under Information Science

To continue the ongoing discussion, I think my concerns and experiences with informal education in coding (and perhaps formal education offered to those not being groomed into being software engineers or developers) fall into two piles: fragile knowledge and craftsmanship.

Fragile knowledge.

Feynman (yes, I know) described fragile knowledge of physics as learning by rote and by being able to work problems directly from a textbook but not having a deeper understanding of the science that enables application in conditions that vary even slightly from the taught conditions. I know my knowledge of physics was fragile - I was the poster child of being able to pass tests without fully understanding what was going on. I didn't know how to learn. I didn't know how to go about it any other way. I had always just shown up for class, done was I was asked, and been successful. In calculus I was in a class that had discussion sections in which we worked problems in small groups - is this why my knowledge isn't fragile in that or is it that I did have to apply math to physics problems? Who knows.

Looking back, now, it seems like a lot of the informal education I've seen for how to code is almost intentionally aimed at developing fragile knowledge. It's not how to solve problems with code and building a toolkit that has wide application. Showing lots of examples from different programs. It's more like list the n types of data.



There is actually a movement with this name and I didn't take the time to read enough about it to know if it matches my thoughts. Here I'm talking coding environment, code quality, reproducibility, sharing.... Not only solving the problem, but doing it in a way that is efficient, clean, and doesn't open up any large issues (memory leaks, openings for hackers, whatever else). Then taking that solution and making it so that you can figure out what it did in a week or so or so that you could share with someone else who could see what it did. Solving the problem so that you can solve the same problem with new data the same way. My code is an embarrassment - but I'm still sharing, because it's the best I know how to do and at least there's something to start with.

A couple of people suggested the Software Carpentry classes - they sound great. Maybe SLA or ASIST or another professional association could host one of these as a pre-conference workshop? Maybe local (US - Maryland - DC ) librarian groups could host one?  We could probably get enough people.

One response so far

My current to-read list

Jun 27 2014 Published by under Information Science

I've been keeping a million tabs open at work and a home, because I haven't even had the time to add things to my citation manager... I also have some things in print that I've been carrying back and forth to work every day (boy is my bag heavy!).  Most of these things probably rate a post of their own, but sigh...  Add to that my obsession du jour with screenscraping and text mining using R, Python, and Perl.... and the fact that I'm not good at it so everything takes longer (also would take less time if I actually RTFM instead of just hopped to code and tried it).

So here are some things on my radar (I'm giving no credit to whoever pointed me to these because I honestly don't remember! Sorry):

  • Hadas Shema,  Judit Bar-Ilan,  Mike Thelwall (in press) How is research blogged? A content analysis approach. JASIST. DOI: 10.1002/asi.23239
    She tweeted a link to the pre-print if you don't have access. I got a about 2/3 through this as soon as I saw it announced and then realized I had been working on a very important work thing and dropped it. Very interesting so far.
  • Lisa Federer (2014) Exploring New Roles for Librarians: The Research Informationist.Synthesis Lectures on Emerging Trends in Librarianship. New York: Morgan and Claypool. doi:10.2200/S00571ED1V01Y201403ETL001
    I was first like meh about this (another name) but then I relooked and I'm interested in their version of embedding.
  • Vanessa P. Dennen. (2014) Becoming a blogger: Trajectories, norms, and activities in a community of practice. Computers in Human Behavior 36, 350-358, doi: 10.1016/j.chb.2014.03.028
  • Paige Brown (11 June 2014) How Academics are Using Social Media. From the Lab Bench.
    This and all the linked reports look very interesting.
  • Pablo Moriano,Emilio Ferrara,Alessandro Flammini,Filippo Menczer (2014). Dissemination of scholarly literature in social media.
  • Jeff Seaman and Hester Tinti-Kane (2013) SOCIAL MEDIA FOR TEACHING AND LEARNING. Boston: Pearson Learning.
    This was probably cited in the blog post above.
  • Liu, Y., Kliman-Silver,C.,Mislove,A. (2014) The tweets they are a-changin': Evolution of Twitter Users and Behavior. ICWSM. (google for it - I have the printout)
    This was mentioned by some folks from MPOW who went to the conference. Provides a nice overview.
  • Tenopir, C, Volentine,R., King, DW, (2013) Social Media and scholarly reading. Online Information Review 37, 193-216. doi: 10.1108/oir-04-2012-0062
    I might have actually read this but it's still riding around in my bag
  • Nentwich, M., König, R.. (2012). Cyberscience 2.0: Research in the age of digital social networks. Frankfurt: Campus Verlag.
    This one is time sensitive as I borrowed it from Columbia.
  • Holmberg, K. Thelwall, M(2013) Disciplinary differences in twitter scholarly communication. ISSI Proceedings 2013.  <- that was typed from my handwriting and not checked. google for it. I think I may have read this, but i have it in the stack to read again
  • Thelwall et al (in press) Tweeting links to academic articles. Cybermetrics J (google for preprint)
  • Haustein, et al. Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature. ArXiv 1308.1838
  •  Sayes,E. (2014) Actor–Network Theory and methodology: Just what does it mean to say that nonhumans have agency? Social Studies of Science 44, 134-149.  doi:10.1177/0306312713511867

And this is just on my screen or in my bag. I think the babies tore up 3 articles i had waiting to be read by my couch :(  So far behind!


Comments are off for this post

Auditors, really?

May 13 2014 Published by under Information Science

I'm all for responsible use of our money and we are very, very careful with our limited funds... so really I would have said, "bring it on" to auditors. Of course, I'm not the financial person, so I don't have to deal with them directly.

So here's the crazy bit: they want us to justify each and every one of our subscriptions for 2006 and 2007. What did we pay? What was the going rate? How many downloads were there for ONLY our lab?  Were there cheaper substitutes? Problem is, my place of work is a research lab that's a division of a major university. Only a few of our licenses are only for our lab. The vast majority are managed by the larger institution - as is proper.

In case you didn't know, usage is hardly ever available by IP. Counter compliant statistics, I'm told, are actually less informative than what was available previously.

So what happens if they disallow our licenses from 2007 in 2014? No clue whatsoever.

The only supporting info I could think to offer beyond what everyone else has done (the tech services folks at the parent institution have been awesome), is what sources we cited in those years. To me, that demonstrates that they were useful/used - even if the year is somewhat delayed.

I was pondering approaches, but I was able to do it in a few clicks in Scopus - not super clean, but useful:

  1. Search for affiliation and pick yours off.
  2. Limit by publication year (2007 in this case)
  3. Select all (we only had ~450/year during this time, ymmv if you have a much larger institution)
  4. Under More > view References
  5. In the refine sidebar, at the bottom, in small print > export refine

So I did see things written years after they were cited, but they were mostly things like standards and technical reports, and unpublished things. The publications also needed a little cleaning up  - aj and apj were there as well as the journals listed under their full names.

Dunno. I guess we'll see if the auditors get this - I'm getting the feeling that they have no concept of how science works.

Comments are off for this post

Random observation: Expanding database scope can sometimes be annoying

Apr 28 2014 Published by under Information Science

Observation, rant, whatever.

I sorta look like an idiot - more than usual, anyway - because I can't pinpoint the number of articles MPOW wrote each year from 2009-2013. I mean, I give a number and then a couple of weeks later they ask me to check the research databases again (I've probably mentioned that we didn't have a mandatory comprehensive tracking system internally until this year) and the number has changed.

In the last 4 weeks, ~77 articles were added to this one set of databases we have with our name in the address field for any author. Of those, 45 were published in 2013 or 2014.

They were all from conferences that should be covered by the database... so I guess I don't really have a solution. I'm comparing our research output with other organizations, though, so if a new organization is added, it's hardly fair to do them now and use last month's numbers for MPOW. SIGH.

2 responses so far

C&RL Vote for articles - write in campaign for Taylor!

Apr 22 2014 Published by under Information Science

Ok, so the only thing I'm doing for this campaign is posting here, but anyway.

College & Research Libraries is having a 75th anniversary special issue and they're asking readers to vote for the best articles: .

I don't know about a lot of the choices. I mean FW Lancaster (framed pdf) FTW, of course! And Kilgour, probably. Plus the anxiety one by Mellon (framed pdf) has definitely had an impact.

BUT they forgot the best one ever:

Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & Research Libraries, 29(3), 178-194. (pdf)

Luckily there's a write in block. So you know what to do... write it in!


Comments are off for this post

Post I would like to write: New trend for linking external and internal information

I first noticed this in chemistry but now I'm seeing it in engineering, too. Major publishers (?), content vendors, indexes (?) are starting to offer services whereby your company can use their search tool to index your local content and display it side by side with their content OR a way to use their api to pull their content in to your local tool.

That's a common complaint in research labs and in companies. Knowledge management of internal information gets funded and defunded and is cool and then not cool... External information is familiar to people coming out of school... how can you search across both.

We have the artist formerly known as Vivissimo (now IBM something spherical I think) as an intranet search, and I would love to see it use our discovery layer type search as a tab. I don't see why it couldn't.

This deserves analysis and thought - no time. sorry!

2 responses so far

A word on ebook iPhone app usability

Aug 29 2013 Published by under Information Science

I'm not a usability expert although I certainly have read a bunch and seen a bunch of presentations (and know a few experts personally), but there are some basic ideas about understanding your user and the tasks they have to perform with your app or device or site that should be somewhat obvious.

I often read books and articles on my iPhone while nursing/rocking my babies. Maybe it makes me a bad mother but it sure has helped with patience over the past almost 18 months! If they're awake and up to shenanigans, I put the phone away and give them my full attention... but anyway. People are shocked and amazed that I can put up with reading a book on my iPhone. I'm not sure why - it's not a tiny font, I can make the font whatever size I need. I have the phone with me anyway. I don't need a separate light source. I can get new books right there instead of having to connect it to my laptop.

One of the things that is super, super important for an immersive reading experience is the ability to quickly turn pages - without even thinking about it and without losing your train of thought. When you're reading on a small screen, you might have like 4 page turns to every one you would have with a print book so it's something you do a lot. (particularly if you're reading <ahem> trashy bodice ripper romances <ahem> that read very quickly!)

Overdrive is the only app you're supposed to be able to use with the Overdrive license my local publib has. They have made two major mistakes with page turning - it's like they don't really get it? First, a while ago they added automation when you turn a page. So it would look like the corner turning up and going over - what a colossally bad idea! No one turns pages because it's cool - you turn pages to see what happens next. They quickly reversed that and made it an option. In the most recent update they've added a bunch of cool things like synching across platforms (good), but they've now made it a swipe instead of a tap to turn the page... and you can't even swipe from the side because that opens a menu, you have to swipe in the middle... which is hard to do one-handed while holding the device. And it's slow... it has to think about it before turning. So then you have to go back and check what was happening and then go forward again... I had a book on there that I had had on hold for a while and I just gave up on it. I'm going back to reading about Web Corpus Construction in a pdf reader like Good Reader.

Update: This afternoon Overdrive released a new version that fixes the page turning issue. I can only hope that they learned from it this time when they didn't learn from it last time.

One response so far

Older posts »