Ebook Explosion

(by Christina Pikas) Dec 17 2014

Seems like all the publishers and all the societies are trying to get into the eBook game. The newest announcement is from AAS (using IOP as a publisher). Considering the fact that a lot of these domains are not particularly known for monographs - like Computer Science and ACM's new ebook line - but instead for conference proceedings and journal articles, seems kinda weird.

Someone mentioned that maybe it was due to the ebook aggregator demand driven acquisition plans - but I think it's just the opposite. Many major publishers have jacked up prices (pdf) on EBL and Ebrary recently - all to push libraries in to licensing "big deal" bundles of the entire front list or entire subject categories. And it is super attractive to buy from the publishers because they're often without DRM, PDFs (one big publisher even offers a whole book in a single pdf, most are one pdf per chapter), ways to view online, easily findable using Google and also nice MARC records for adding to the catalog.

The ebook aggregators have nasty DRM. They have concurrent user rules. They have special rules for things that are considered textbooks.  We have to login with our enterprise login (which isn't my lab's day-to-day login) and the data about what books we view is tied to our identities. The new prices end up being as much as 30-40% of the cover price for a 1 day loan. That's right. The customer can look and maybe print a couple of pages for 24 hours and the library is charged a third the cover price of the book.

But for the society and publisher own pages, what seems like a one time purchase has now become yet another subscription. If you buy the 2014 front list will you not feel the pressure to buy the 2015 and 2016 publications?

Aggregators had seemed like some of the answer, but not so much with these prices. We've already mugged all the other budgets for our journal habit so where do these new things come from? The print budget was gone ages ago. Reference budget was also raided.  The ones we've licensed do get used a lot at MPOW.

No responses yet

Continuing value and viability of specialized research databases

(by Christina Pikas) Nov 26 2014

There was an interesting thread yesterday on the PAMnet listserv regarding "core" databases in Mathematics and which could be cut to save money.

One response was that it's better to search full text anyway (I couldn't disagree more).

Ben Wagner expressed concern that Google Scholar was going to drive all of the databases out of business and then Google would abandon the project.

Joe Hourclé posted about ADS - a core database in astro. Fred Stoss posted about PubMed - needs no intro here, surely!

Here's my response.

I think Scopus and WoS are the biggest immediate threats to the smaller domain specific indexes particularly when the largest number of academic users are looking for a few reasonable things and aren't doing the complex queries or needing to be very precise and have very high recall. In my world, I'm like the goalie: by the time they ask me, they've tried Google, they've asked their friends, they've asked their mother*... it's gotten past 10 people without an adequate answer. For these hard questions, I need the power of a good database (like Inspec). But... if you look at quantities and numbers of users... does that justify the huge cost? Maybe? But do our auditors agree? Infrequent big wins vs. day to day common usage?

As Ben has often chronicled, we've shifted money out of every other budget to support our sci/tech journal habit. We've starved the humanities. We've dropped databases. All for more and more expensive journals. Seems like if the content does get paid for out of other budgets via page charges or institutional support for open access publishing, that might make it even more important that libraries have better ways to find the distributed content. But, like Ben, I worry that we'll put these finding tools out of business.

Another observation: two of the "core" databases mentioned, ADS and PubMed, are government supported as a service to the community. The solar physics bibliography is a very specialized resource but is also super important to those researchers. Maybe if building specialty research databases is no longer profitable but there remains a need, the community-built tools will improve/grow/gain support? Maybe they'll be backwards and using technology from 1995, though :)

I'm working with some projects that are actually taking big piles of full text documents and using computational methods to classify using an ontology that's built by subject matter experts (with some advice from a professional taxonomist in my group). The volume/velocity/yadda yadda of the data precludes the careful indexing done by our fancy databases... but this and other projects like it I think show a swing back toward the importance of good indexing and the importance of having domain experts reviewing the classification system.

 

* My mom is a statistician so I might ask her first

 

No responses yet

PyCharm FTW

(by Christina Pikas) Oct 12 2014

Another random Python note. I asked at work again in the Python group of our internal social networking thingy and consensus was that I should try PyCharm as a development environment.

All the stinking tutorials are like use a text editor and command line - and that's what I'd been doing - but with R, RStudio is so fantastic that I thought surely there must be something workable for Python. I had tried the eclipse plugin and I couldn't even get it to run a program and i couldn't figure out what it was doing and ugh.

PyCharm now has a community edition so you don't even have to prove you're a student or pay for it. It's lovely, really. I don't see why I should have to use VI like it's 1991 or beat on something with rocks to see where I'm missing a quote or have the wrong indents. Why not have help? I'm trying to accomplish a task not create art.

I really do have to continue coding and stop playing with Python. Particularly since when I do I end up losing hours of my life when I'm supposed to be sleeping!

2 responses so far

Post migration settling in

(by Christina Pikas) Oct 06 2014

We migrated to a new server and updated our software. See here: http://scientopical.scientopia.org/2014/10/02/server-migration/

There might be some settling in issues. Let me know if you see anything here. I suspect my code on the sidebar needs attention - which I do not have time to give it :(  Also, comments might get caught in pending or spam filter. I hope not but let me know.

In case you missed me: dissertation progressing, committee ok with progress, toddler twins will likely kill me, family issues (health and otherwise), work is busy in a good way.

No responses yet

WRT NCTA ads: No.

(by Christina Pikas) Sep 16 2014

Net Neutrality FUD all over the place. NCTA has these ads in the WaPo about how 1) you can find everything on the internet and 2) you can't find any reason the internet should be regulated like a utility.

No.

You can, indeed find many well-reasoned essays on why the internet should be regulated like a utility - see, for example, books and essays and blog posts by Lessig, Wu, Cerf, and others. All shared freely. For now.

You can't find everything on the internet, either, and even many  "kids today" know this. See.

No responses yet

Government cost recovery gone awry: PACER and NTIS

(by Christina Pikas) Aug 27 2014

(reiterating these are just my personal opinion and do not reflect anything from my place of work - if you know what that is - or anything else)

For many years, the US federal government has tried to cut costs by outsourcing anything that isn't inherently governmental, making sure that government doesn't compete with industry, and requiring cost recovery for government agencies that provide services to other agencies (see A-76 ).

Old examples that might have changed: GPO had to do all printing of history books for military historians, but the quality was ok, the distribution was crap, and the DoD history organizations and readers had to pay a lot of money. So what they did when I worked there was to give the book to a university press that would do a decent job with it. The books were not copyrightable anyway because they were work for hire by a government employee. Everyone was happy. Another old example was that Navy was required to send all records to NARA. But then Navy all the sudden had to pay NARA to keep the documents (I think this has changed - my example is from late 1990s). This was things like deck logs. Hugely important documents.

NTIS has long been caught up in this. Agencies producing technical reports are required by law to send them to NTIS (if they are unlimited distribution). NTIS is required to recover the cost of their administration and archiving by selling the documents. This is hard because first, agencies are not thorough in sending stuff to NTIS (often because their central repository isn't even getting copies - even though required by regulations, instructions, etc.) and second, agencies make these documents available for free from their own sites.  NTIS also has picked up a few bucks here and there doing web and database consulting and licensing their abstracting and indexing database to vendors who resell to libraries. Why pay for it from a third-party vendor? Cross search with your favorite engineering database. Better search tools.

PACER is also caught up in this. There's actually a law that says US Courts has to recover the cost of running the system by charging for access or for documents. They do not want to but there is a law that they must obey.  This is information that really should be freely available and easily accessible. A famous activist tried to download the whole thing and make available, but he was stopped.

The results of forcing these agencies - GPO, NTIS, US Courts - to recover their costs are great and they directly work against the open government we need and deserve. It causes the agencies to cut corners and not have the systems they need. It causes customer agencies and citizens alike to distrust and dislike them.

Now, US Courts has removed large collections of historical documents from PACER because of an IT upgrade. Read the Washington Post article. Various people in Congress are trying to shut NTIS down, again. GPO seems to be ok, for now - lots of cool neat things from them.

Libraries  - like mine - have been burdened by cost recovery, too, and it often signals the beginning of the end. Superficially, makes sense to show how much something is valued and by whom. In practice, you need a lot more accounting systems and controls over the professional workers that prevent them from doing their job. These services are directly in support of strategic requirements (open government and accountability) but are infrastructure. People are blind to infrastructure until it's no longer there.  NTIS, PACER, GPO and others need to stop with this cost recovery business (meaning Congress has to pass a law that removes that requirement) and be funded as infrastructure. Outsource to get needed skills you can't hire in government, but be smart about it.

No responses yet

Fragile knowledge of coding and software craftsmanship

(by Christina Pikas) Aug 11 2014

To continue the ongoing discussion, I think my concerns and experiences with informal education in coding (and perhaps formal education offered to those not being groomed into being software engineers or developers) fall into two piles: fragile knowledge and craftsmanship.

Fragile knowledge.

Feynman (yes, I know) described fragile knowledge of physics as learning by rote and by being able to work problems directly from a textbook but not having a deeper understanding of the science that enables application in conditions that vary even slightly from the taught conditions. I know my knowledge of physics was fragile - I was the poster child of being able to pass tests without fully understanding what was going on. I didn't know how to learn. I didn't know how to go about it any other way. I had always just shown up for class, done was I was asked, and been successful. In calculus I was in a class that had discussion sections in which we worked problems in small groups - is this why my knowledge isn't fragile in that or is it that I did have to apply math to physics problems? Who knows.

Looking back, now, it seems like a lot of the informal education I've seen for how to code is almost intentionally aimed at developing fragile knowledge. It's not how to solve problems with code and building a toolkit that has wide application. Showing lots of examples from different programs. It's more like list the n types of data.

 

Craftsmanship.

There is actually a movement with this name and I didn't take the time to read enough about it to know if it matches my thoughts. Here I'm talking coding environment, code quality, reproducibility, sharing.... Not only solving the problem, but doing it in a way that is efficient, clean, and doesn't open up any large issues (memory leaks, openings for hackers, whatever else). Then taking that solution and making it so that you can figure out what it did in a week or so or so that you could share with someone else who could see what it did. Solving the problem so that you can solve the same problem with new data the same way. My code is an embarrassment - but I'm still sharing, because it's the best I know how to do and at least there's something to start with.

A couple of people suggested the Software Carpentry classes - they sound great. Maybe SLA or ASIST or another professional association could host one of these as a pre-conference workshop? Maybe local (US - Maryland - DC ) librarian groups could host one?  We could probably get enough people.

One response so far

What I want/need in a programming class

(by Christina Pikas) Aug 08 2014

Abigail Goben (Hedgehog Librarian) has a recent blog post discussing some of the shortcomings she's identified in the various coding courses she's taken online and the self-study she has done.

I think my view overlaps hers but is not the same. Instead of try to compare and contrast, I'll say what I've seen and what I need.

I'm probably pretty typical of my age: I had BASIC programming in elementary and high school. This was literally BASIC and was like

10 print "hello"
20 goto 10

I think we did something with graphics in high school, but it was more BASIC.  In college, they felt very strongly that physics majors should learn code, so I took the Pascal for non-CS majors in my freshman year.  That was almost like the BASIC programming: no functions, no objects... kinda do this, do this, do this... turn it in. I never did see any connection whatsoever with my coursework in physics. I never understood why I would use that instead of the Mathematica we had to use in diffeq

In the workforce, I did some self study javascript (before it was cool), html, CSS - not programming, obviously. And then I needed to get data for an independent study I was doing and my mentor for that study wrote a little Perl script to get web pages and pull out links. The script she wrote broke with any modifications to the website template, so after waiting for her to fix for me, I ended up fixing it myself... which I should have done to start with. ... In the second stats class another student and I asked if we could use R instead of Stata. He was going back to a country with less research funding and I was going to work independently. But then, we just used the regression functions already written out and followed from a book. Elsewhere in the workforce I've read a lot about R and some co-workers and I worked through a book... I did the CodeAcademy class on Python.

All of these classes - if they weren't in interactive mode, they could have been. What are the various data types. How do you get data in there and back out again. How do you do a for loop. Nobody really goes into any depth about lists in R and they pop up all over the place. I couldn't even get Python installed on my computer at first by myself because everyone teaching me was on a Mac. (btw, use active python and active perl if you're on Windows - not affiliated, but they just work).

The R class on Coursera (same one she complains about) and the data science class by JH there were the first that even really made me do functions. What a difference. I really appreciated them for that.

So here's what I think:

People new to programming - truly new - need to understand the basics of how any program works including data types, getting data in and out, for loops. But also architectural things like functions and objects. They probably need to spend some time with pseudocode just getting through the practice.

Then if you're not new to programming, but you're new to a language - different course. In that course you say this is how this language varies, this is what it does well with, here's where it fails.

Then there needs to be an all about software design or engineering or process course that talks about version control and how to use it. How to adequately document your code. How to write programs in a computationally efficient way. The difference between doing things in memory or not.  What are integrated development environments and when would you use one. This is what I need right now.

If it's something basic, I can follow along a recipe I can read off of stack overflow, but I know nothing about efficiency. Like why use sapply vs. a for loop? Is there a better way to load the data in? Why is it slow? Is it slower than I should expect? I love RStudio - love, love, love! But I tried something like that for Python and could never get it to work. I'm still learning git, but I don't really understand the process of it even though I can go through the steps.

Anyhow, more about me, but I think I'm probably pretty typical. I think there's a huge gap in the middle in what's being taught and I also think that a lot of people need the very basics of programming almost minus the specific language.

6 responses so far

Mom always said: Clean as you go

(by Christina Pikas) Jul 31 2014

But do we listen? Not so much.

Turns out MPOW (a division of a larger institution) has not deleted *any* borrower records since we merged into the larger institution's catalog maybe 10 years ago. We used to have our own catalog back in the day. No idea how much we maintained that stuff either - last time I did a dump from it to create an electronic badge-scanning sign-in system for an open house - we had users with the names "brontosaurus", "washington, george", "gibson, r.e." (which you won't get unless you know more about where I work, well and he's been dead for a couple of decades).  I think the professionals who ran the larger system helped us clean up a bit on migration.

So here we are, ready to integrate further with automatic registration and maintenance of records and we figured we should probably clean up prior. Oy.

Turns out in Sirsi Dynix Horizon, you have to identify the borrower and *edit* their record to have the option to delete. It was always grayed out for us because we didn't think about having it open for editing first. All this time we've been getting notices of employee actions, but have done nothing. We used to be a required stop on the employee checkout list but they took us off when we got rid of our print collection.

Now, how to match current employees? I can get a list but the export from Horizon shows how poorly we did data entry when creating the accounts. Some have the whole name in various orders in the last name field. Some have that with periods in it. There's a name of a university there (why?). E-mails missing, employee numbers missing, obsolete borrower types. People who have joint appointments with other divisions of the larger institution who have these weird hybrid records.

At first pass with a short R script, I identified 500 records of the 3500 that need to be checked. And that was only using last names so if there are 10 bad Smiths for the one good Smith, then they get a pass. I'm sure we'll get exception reports or something at the load, but we're trying to get ahead of the game.

So kids: do as your mom told you and CLEAN AS YOU GO!

No doubt I will continue not to heed this advice, either :)

No responses yet

Kindle Unlimited. Distruction? No.

(by Christina Pikas) Jul 20 2014

Amazon announced a service where for a monthly fee (currently $9.99) you can read unlimited ebooks a month (from a stock of 600k). John Dupuis has been linking to various stories about it on Twitter. One from Vox suggests the new service might lead to library's destruction what with funding issues what they were a few years ago.

Many others have pointed out the issues with that. The first being that most libraries offer a service free (because of your tax dollars) that lets you check out unlimited books a month for just about any device. Mine limits me to 6 books at a time from one of the 3 or so services they offer and a certain number of "units" or "credits" from another of the services.

If you have the $10/month, though, this would mean no waiting where you often have to wait for popular titles at the public library because of idiotic requirements from the publishers that make these services treat ebooks like print books - one user/copy at a time.

Today, John pointed to this piece by Kelly Jensen on Book Riot. Jensen is all offended at people saying libraries are the NetFlix for books because libraries do so much more. Well, yes, but I just can't get offended. I've found and a local survey has shown that people often don't know what ebooks are, how they can get them, and what is available at their library. They do know about Amazon, but they don't know about Overdrive. Amazon is easy, Overdrive can be hard (once you get going it's very simple, though). I actually think it's helpful and useful and not too reductionist and problematic to refer to libraries as the NetFlix for books. Emphasizing and publicizing this one small service won't cause people with small children to forget about story time, college students and professors to forget about doing research, avid readers to forget about print books. It may, however, bring in new users or bring back users/patrons who have been too busy to come in person or who are now home bound for some reason.

What I do wonder about is licensing. There has been lots of discussion about the big 6 and how they really, really don't want libraries to lend ebooks. Some have done stupid things like say libraries have to rebuy the book after it has been checked out 26 times or so. Others have delays or flat-out won't license ebooks to libraries. (not talking research libraries and STEM books - we can get them if we want to spend the $ and they aren't textbooks). One big publisher gave in recently - but sort of slowly.

Amazon's in a big fight with one publisher with all sorts of shenanigans like slowing down shipping for their books. The authors are all up in arms. It's a mess. With that uneasy relationship, I really am curious about publishers participating in this program. Do they see it differently than the library products? Is it just the same ones that do license for library use? 600k books - but which 600k? Presumably the entire Project Gutenberg library is on there (see "catching up on classics") .... and some other books are featured on the home page. I don't have time to do the analysis, but I'm curious.

$10/month could add up... particularly if the catalog isn't that large. I think it sounds like a good service, but the devil is in the details. What does the catalog look like? Are people out of money to spend on entertainment with all the video downloading services and internet and data and what not? Seems like an obvious move for Amazon (they aren't the first - Oyster and Scribd have similar services). We'll see, I guess.

No responses yet

Older posts »