One of the reasons Google Scholar is so attractive is that it covers every area of research. Another is that it’s quick. Another is that it’s free. But, it doesn’t necessarily go that far back, it’s unclear exactly what it does cover, it lags publication (and we don’t know how long), and it doesn’t support very sophisticated searches. Plus there’s no controlled vocabulary behind it so you don’t get all the results that are relevant. Of course no substructure searching 🙂
Library tools such as catalogs and research databases have other sets of problems. The research databases that have the powerful search tools and are really well indexed with a good controlled vocabulary tend to cover a fairly narrow research area: physics, medicine, chemistry. Other tools that cover a broad group of topics probably don’t have as good indexing as GoogleScholar, but offer full text. Catalogs are typically miserable to search. It’s very hard to find what you’re looking for in most catalogs, particularly if you’re doing a subject search and not looking for a known item.
The narrow bit is a particular problem in the newer disciplinary areas or interdisciplinary areas. That’s one of the reasons libraries started licensing federated search products maybe like 10 years ago? I should probably explain again, although I’m fairly certain I must have before. A federated search takes your query translates it for a set of pre-selected database, sends it out, and compiles a list of results. Compare this to something like google that has already gone around to all the websites, crawled them, stored the results, and then created optimized indexes and what not. So you begin to see that federated searches seem really slow. Plus the search language ends up being lowest common denominator with only a limited number of fields. What’s worse is that out of the box the results pages weren’t very well done (there’s an add on that we have that improves this a lot).
So, a few vendors managed to negotiate a deal with the databases to – yes – index them in advance, and show the results in a slick interface. These things are being called discovery layers. You would only get the results of the databases you pay for from the original vendor. (well, that brings up a question – for something like, say, Inspec, we pay the database producer fee plus a markup for the interface provider… I wonder if you just pay the first part? dunno). Anyhow, you get the speed of something that is indexed in advance and the benefits of having the underlying databases. Typically they’ll suck up your catalog and institutional repository, too.
Your reaction is probably like mine was: how do you get all of the underlying databases to sell to you? Without them, it falls apart.
So that brings us up to the current part of the story. I’ve mentioned how Ebsco is really on a power grabbing mission. They own a bunch of databases. They are also developing one of these discovery layer deals. Well one of the discovery layer deals wrote a letter to all of their customers saying Ebsco had pulled information out. Iris has all the information on her post – so I won’t repeat it, but apparently none of the discovery layers are providing information to any of the others. That leaves us with crawl and cache for one big vendor and federating the competition’s, no matter which vendor we pick.
Other things called discovery layers are actually just overlays for the catalog. That’s what ours is going to be. That nibbles away at one part of the problem but really doesn’t approach the elephant in the room.
Sigh, a bit depressing, but now that I ponder the whole thing, I’m not sure how much of a loss. There are lots of research efforts about dealing with multiple ontologies for dealing with scientific data coming from multiple sources. Maybe we can figure out something better like something that really uses the controlled vocabulary.