Archive for the 'finding information' category

Continuing value and viability of specialized research databases

Nov 26 2014 Published by under finding information, Information Science

There was an interesting thread yesterday on the PAMnet listserv regarding "core" databases in Mathematics and which could be cut to save money.

One response was that it's better to search full text anyway (I couldn't disagree more).

Ben Wagner expressed concern that Google Scholar was going to drive all of the databases out of business and then Google would abandon the project.

Joe Hourclé posted about ADS - a core database in astro. Fred Stoss posted about PubMed - needs no intro here, surely!

Here's my response.

I think Scopus and WoS are the biggest immediate threats to the smaller domain specific indexes particularly when the largest number of academic users are looking for a few reasonable things and aren't doing the complex queries or needing to be very precise and have very high recall. In my world, I'm like the goalie: by the time they ask me, they've tried Google, they've asked their friends, they've asked their mother*... it's gotten past 10 people without an adequate answer. For these hard questions, I need the power of a good database (like Inspec). But... if you look at quantities and numbers of users... does that justify the huge cost? Maybe? But do our auditors agree? Infrequent big wins vs. day to day common usage?

As Ben has often chronicled, we've shifted money out of every other budget to support our sci/tech journal habit. We've starved the humanities. We've dropped databases. All for more and more expensive journals. Seems like if the content does get paid for out of other budgets via page charges or institutional support for open access publishing, that might make it even more important that libraries have better ways to find the distributed content. But, like Ben, I worry that we'll put these finding tools out of business.

Another observation: two of the "core" databases mentioned, ADS and PubMed, are government supported as a service to the community. The solar physics bibliography is a very specialized resource but is also super important to those researchers. Maybe if building specialty research databases is no longer profitable but there remains a need, the community-built tools will improve/grow/gain support? Maybe they'll be backwards and using technology from 1995, though :)

I'm working with some projects that are actually taking big piles of full text documents and using computational methods to classify using an ontology that's built by subject matter experts (with some advice from a professional taxonomist in my group). The volume/velocity/yadda yadda of the data precludes the careful indexing done by our fancy databases... but this and other projects like it I think show a swing back toward the importance of good indexing and the importance of having domain experts reviewing the classification system.

 

* My mom is a statistician so I might ask her first

 

No responses yet

Searching Scopus by Date Added to the Database

In my previous post, I complained that my metrics weren't comparable over the course of a few months, even for articles published in 2009.

I looked in the instructions, and I couldn't find anything that discussed searching by date added to the database. I looked at all the fields on the detailed view and there wasn't anything to help. No accession number. No date added. Hmph.

So I started to think about the alerts I had set up.When you click through "view all new results in Scopus", you get a search like so:

(AFFIL((my place of work)) AND ORIG-LOAD-DATE AFT 1390059048 AND ORIG-LOAD-DATE BEF 1390674349
Huh. So I wondered... can you just find the right AFT and search in advanced search for that?  Yup. Sure can!
What are these crazy numbers though? (most people will know right away - I didn't, and I should have). So I looked around - no I didn't have any from that time period to use. I chatted with the Scopus help and they insisted 1) can't search on that field (I told them I already proved you could) 2) it was part of the alert system and not part of the database (????) 3) they couldn't give me the numbers for the time period I want, because you can't search for them anyway.
So then I asked LSW and the brilliant Deborah and as brilliant but time delayed Meg told me it was Unix time - seconds since 1/1/1970.  Stephanie also provided me with a search string from early January (thank you!). I read about that in R, but Deborah even linked to an online converter and boom - Bob's your uncle.
So, if you want to find articles added to the database before or after a certain time, convert the time to Unix time and then use
ORIG-LOAD-DATE AFT
or
ORIG-LOAD-DATE BEF

adding 5/7: I was contacted by Scopus - I would like to post detailed information from the e-mail but haven't gotten permission. She did verify that this search will work, but only so far back. That information isn't kept indefinitely. Also, you can use RECENT(n) where (n) is the number of days. You can AND that on to any advanced search.

No responses yet

Searching for textbooks in a library catalog

Feb 14 2014 Published by under finding information

So this should be easy/obvious, but it's not so much ... at least to me. It's very common that people want to locate textbooks for diverse reasons including:

  • they want to get it from a library because they don't want to pay for it (grrr...)
  • they are thinking about taking the class and they want to see what's involved
  • they want to get background/overview information on a field to learn a new field or brush up on old knowledge
  • they need to reference a basic idea from their field

For the first bullet, they should have the exact title, author, publisher, year, etc., so it's a standard known item search (which might also be complicated*). For the second, we or they can go to the syllabus of a class and get the book information or even go to the bookstore page to look up the book information.

For the last two, however, it's more of a subject search and then narrowing for format. It's not format in a straight forward sense like CD or DVD, it's the format of the content inside the container.** Librarians have some tricks in searching for textbooks including combinations of the following:

  • One word titles like Microbiology or Physics
  • Foundations of... or Introduction to... in the title
  • Using [topic] -- textbooks (not as effective as you'd like)
  • Looking up publishers that do a lot of textbook publishing (e.g., Pearson)

We at my larger institution were asked if we could add a facet in for textbooks. Or could we try and how hard would that be. So we're looking at these tools and how we could write an algorithm to tag various records so they could be brought up in search. How accurate do we need to be? Should we say it has to have one or more of the above categories? Ideally, we could also get multiple semester's adoptions from the bookstore or whatever, but we also want to know what other institutions use for textbooks. Some vendors (like Elsevier and maybe Springer, EBL) put textbook-type ebooks on a different platform or on the same platform but with a different license. Could we somehow get those lists and use them to tag items? How do we maintain as new items are purchased or licensed? Maybe YBP or whoever has a mark for that? Could we get that information for items we've already purchased over the last few years?

Ugh. Searching for this is difficult because all I keep finding are messages of varying crankiness levels about how students need to buy their own damn textbooks and not count on getting them from a or any library!  Ideas?

* Lee, J. H., Renear, A., & Smith, L. C. (2006). Known-Item Search: Variations on a Concept. Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST), Austin, TX.

and the Kilgour series from JASIST c2001-2004

but: Buckland, M. K. (1979). On Types of Search and the Allocation of Library Resources. Journal of the American Society for Information Science, 30, 143-147.

** FWIW, we don't even do this uniformly with conferences. Some are "books" and some are conference proceedings in our catalog and some LNCS are monographic series so they come up as periodicals. And there are conference proceedings printed in journals...

2 responses so far

InfoDocket a new place for updates in all that's happening in the info* world

Feb 19 2011 Published by under finding information

In case you missed the announcement, the fabulous Gary Price (a national treasure, as my boss says) and Shirl Kennedy have left ResourceShelf, and started up a new site: InfoDocket. Like ResourceShelf, this is an update on all that's happening in the information world - with our vendors, what libraries are doing, what governments are doing related to information and libraries, what search engines are doing, etc. No newsletter yet, but they will be developing one later.

They also have a new site to list full text reports that are now available (http://fulltextreports.com/). Compare this to DocuTicker. It has reports from government and non-governmental organizations on all sorts of topics.

I highly recommend both of these sites to anyone working in this world.

(not affiliated, yadda yadda, although I consider Gary a friend)

No responses yet

Well, sometimes you just have to Google it

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work with one chemist quite a bit. Finally I gave up and googled it. After a few tries, I found way down in the results an article about something else (like I needed a chemical in an aqueous solution and it had the chemical in alcohol), but the snippet drew my eye. Sure enough - had a table with my data in it. An ACS journal from 1945.

The data I needed were not the focus of the paper - they were there sort of as a calibration or reference type thingy - to show what the setup would do with no alcohol. So it's absolutely right that the document wouldn't have come up in my search, because technically the article didn't match. That's why the full text search worked.

It could be that I could locate the info using SpringerImages (but it's an ACS article) or using CSA's deep indexing (is illustrata still around? I did try Aerospace & High Tech).  Lesson learned.

8 responses so far

scio2010: What can librarians do for scientists

This is a session by Stephanie Willen Brown and Dorothea Salo .

They started with a bunch of questions. About half the room was librarians, of the others split between affiliated with an institution and not. Where do you go for full text? Google, Google Scholar. Does that work? Sometimes - if not quick if not free to me then move on.

See if your state library has research databases - like NClive, iConn. Contact one of us and we'll put you in contact with someone local.

Come ask your librarian if you need help with anything - even if they don't already provide that service, you help them with ammunition to take to their bosses to start the service and/or can be a guinea pig to test a beta service.

A last thought - needs to be easier to add things to repositories.

No responses yet

Free IS and CS books online

Nov 03 2009 Published by under finding information, Information Science

One of the great things about my interests overlapping computer science is that computer scientists believe in self archiving and making their work freely available on the web. The scientometric parts of IS are that way, too, but the L of the LIS... well, that's just sad (except for Dorothea, her stuff is available). I still hope to write a review of one of these books because I'm really enjoying it. Here are a few:

  • Hearst, Marti (2009). Search User Interfaces. Cambridge University Press. Available from: http://searchuserinterfaces.com/book/.
    Sure there are lots of books on information retrieval, search engines, interface design, and information architecture. This book is about designing the interaction required for good searching. There is more to it. I'm about a third the way through reading this book and it's excellent so far. She cites references for each point she makes and that makes me happy. I actually plan to buy a print copy at some point although it's really cool how you can mouseover the citations in the online version and it shows you the whole citation - you don't have to click to the bottom of the page or click through.
  • Easley, David and Kleinberg, Jon (in press)  Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. Available from: http://www.cs.cornell.edu/home/kleinber/networks-book/ (in pdf per chapter or entire book).
    You might say, oh another book on networks, sigh, but Kleinberg is a leader in that area and this book grew out of a course he's taught up there. I'm not as familiar with the markets part so I plan to browse those sections.
  • Manning,Christopher D., Raghavan, Prabhakar, and Schütze,Hinrich(2008)  Introduction to Information Retrieval. Cambridge University Press. Available from: http://nlp.stanford.edu/IR-book/information-retrieval-book.html 
    One kind of cool thing about this site is that the authors have continued to update the book as they go. In that way, it might even be better than the print book.This is sort of a standard book on information retrieval. I've read maybe 6 chapters from it. Some are easier to understand than others.
  • Allen, Robert B. (in press) Information: A Fundamental Construct. Available from http://www.cis.drexel.edu/faculty/ballen/ISS/index.html
    This book is new to me, but I enjoyed my class with Dr. Allen and I think there's a need for a general intro to LIS book.

Note: I had this post 90% done a few weeks ago - but my computer died.

No responses yet

New Google language features for off-the-cuff cross-language information retrieval

Cross-language information retrieval is an important research area with lots of activity. There are all kinds of elaborate algorithms and ways of doing it. There's a lot of domain specificity and connotation kind of things that have been really improved in the past decade.
Most people searching won't really have the support of the fancy specialized tools. I've approximated some of this searching for years using various basic search engine language tools. Luckily, recently they've added a lot more Chinese, Japanese, Persian, and Russian translation options in addition to the Western or Romanized languages. They've also become a lot more sophisticated.
Typically these sites offer to translate a word, a passage, or the page at a URL. Google now offers alternate meanings for a word (cool!) and also will translate a search.
For alternate meanings here's an example:

Translating the search is even cooler:

Update: you can find this stuff on the main page of google next to the search box. Here's a direct link: http://translate.google.com/

One response so far

Free (to you) research databases

Some of these are better than others. Some don't have nice controlled vocabularies and are a bit wonky in the free version.  Nearly all of them you can get through another interface for a fee if you need more precision in searching or to export your results. (oh, as an aside - you've got the database producer who puts the whole thing together, and then you have options for interfaces. For example, for Inspec, you can pay for access via STN, DIALOG, Web of Knowledge, EbscoHost, Engineering Village - used to be FirstSearch and Ovid, too, but I don't remember if they're still offering it. Librarians, when selecting these, have to pick databases and then interfaces based on functionality, cost, etc.). Some research databases we pay for don't actually have controlled vocabularies ( like Web of Science and Scopus) but have other features that are useful.

  • ADS - the Astrophysics Data System - This is a great database that does a lot of cool stuff with linking to data and stuff, but it doesn't have a controlled vocabulary. It does have some full text, though. Actually a lot of full text.
  • Ageline -This is from AARP and it's on all sorts of topics about aging from economic and social to health issues. It's got a cool thesaurus and a new interface.
  • Agricola - Talk about a mess of an interface. Even the folks at NAL pay a vendor to provide an interface. Funding for this has been up and down, too. It covers all areas related to agriculture. It has a controlled vocabulary.
  • Citeseer & CSB- (I suppose you could put these in here, if you really want to)
  • ERIC - This is done by the the Department of Education. Funding's been a little uneven (or a lot uneven)
  • National Criminal Justice Reference Service - From DOJ Office of Justice Program.
  • INIS - This is from the IAEA and it's got all aspects of science and related to the peaceful use of nuclear technology. Nice controlled vocabulary. Goes back to the 70s. Pretty cool. Make that very cool.
  • PubMed - Everyone knows about this. Don't forget it also includes veterinary stuff, too.
  • TRIS - From the National Transportation Library and the TRB (part of the National Academies). Covers all sorts of stuff related to transportation.

 

Oh, and there are technical report digital libraries from the government that have full text and also have controlled vocabularies. And places to get data and maps and stuff. We'll leave those for future posts.

I'm sure there are probably some other non-profits or governmental organizations with freely available research databases. If I've forgotten any big ones, please let me know.  Don't forget, too, that lots of research databases are free to you from your local public library, your workplace, or your school.

update: 2/8/2010 Ageline was purchased from AARP by Ebsco. Ebsco has now closed access - put it behind a pay wall.

5 responses so far

Finding information on a topic

Previously, I had a post about finding information in books using things like Google Book Search. This post talks about finding information on a topic, or more specifically, why you should start your search with a research database and more about what research databases are (like the real ones). In a post coming up, I'll give some information on some free to you research databases (the real ones).

You should start your search with a research database to be more comprehensive, to cover multiple sources and publishers, to have real searching power/precision, and because of the vocabulary problem.

We know what research databases cover - they tell us the list of conferences and journals and the years covered, unlike the mystery meat that is Google Scholar. The biggest of the databases, and the most incredibly expensive (well into the 6 figures for an R1 in the US for a single database with a limited number of concurrent users), are pretty much comprehensive in coverage. Take Chemical Abstracts, for example. Pretty much anything you'd want in chemistry. Same with PubMed which is free to you. Even if you're talking about a much smaller database like Aerospace & High Technology, it covers multiple publishers' stuff, government stuff, and stuff in a bunch of different languages. Oh, and the different languages - well they're still indexed in English. If you decide you need it, well then you'll need to find a translation or just deal with the pictures and equations.

If you go directly to a publisher's digital library, IEEE Xplore or Science Direct, for example, instead of Inspec or Compendex, you'll miss things from the other and from Springer, or SPIE or OSA or ACM or any other publisher. And you won't have much power in searching/

Let's talk about power. Analytical abstracts lets you say if the chemical you're looking for is an analyte or matrix (very handy!). Inspec lets you look up frequency ranges (numerical indexing). BIOSIS lets you look up taxonomic terms (like kingdom, phylum, class, ...). A bunch of them let you look for a treatment - application, theoretical, etc. You can be very precise.

The vocabulary problem is basically that different people use different terms to talk about the same thing. Sure, if you work in the area you'll know the difference in connotation between lidar, ladar, and optical radar, for example. But really, articles about any of these might be useful. If you use Google or do natural language searching in another digital library, you'll need to OR all of these variations. Can you think of all of the variations? What about British spelling or American spelling?Real databases pick a preferred term - not one that is better, but pick one to stand for the concept. So you can find out what this term is and pull up all the articles on that concept, even if your word doesn't appear in the title or abstract.

When I say pick one, I mean specialists spend time coming up with a controlled vocabulary and decide what terms are preferred (because they're most common or whatever). These are also arranged hierarchically so you can explode a search which means you can search for anything under that higher level topic. You can also find related terms.... Oh, when I say pick one, I also mean that most of the research databases have human indexers. It's probably machine-aided (machine suggests terms), but there are humans writing the rules and doing quality control. Humans who ask themselves: what questions might this article answer? what queries does this answer? how can I best describe this content so that it can be found by people who need it?

Ok, now you're sold. How do you find one of these bad boys? If you're in biomed (which it seems like everyone online is sometimes), then just use PubMed and supplement it if you're at a research institution with Embase, BIOSIS, Chem Abstracts, or if you're in biomedical engineering with Inspec and Compendex.  For the rest of the world, check out the recommended databases in the subject area on a library's research guides. Pick a library! Or, you can look at the descriptions on DIALOG or STN - they have transactional pricing for access to databases.

I'll tell you about some free ones in an upcoming post.

PS - I almost forgot my example: flutter. Look it up in an aerospace database if you want it for missiles, in a biomed database if you want hearts, and who knows what you'll get in Google!

No responses yet

Older posts »