Free IS and CS books online

Nov 03 2009

One of the great things about my interests overlapping computer science is that computer scientists believe in self archiving and making their work freely available on the web. The scientometric parts of IS are that way, too, but the L of the LIS... well, that's just sad (except for Dorothea, her stuff is available). I still hope to write a review of one of these books because I'm really enjoying it. Here are a few:

  • Hearst, Marti (2009). Search User Interfaces. Cambridge University Press. Available from:
    Sure there are lots of books on information retrieval, search engines, interface design, and information architecture. This book is about designing the interaction required for good searching. There is more to it. I'm about a third the way through reading this book and it's excellent so far. She cites references for each point she makes and that makes me happy. I actually plan to buy a print copy at some point although it's really cool how you can mouseover the citations in the online version and it shows you the whole citation - you don't have to click to the bottom of the page or click through.
  • Easley, David and Kleinberg, Jon (in press)  Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. Available from: (in pdf per chapter or entire book).
    You might say, oh another book on networks, sigh, but Kleinberg is a leader in that area and this book grew out of a course he's taught up there. I'm not as familiar with the markets part so I plan to browse those sections.
  • Manning,Christopher D., Raghavan, Prabhakar, and Sch├╝tze,Hinrich(2008)  Introduction to Information Retrieval. Cambridge University Press. Available from: 
    One kind of cool thing about this site is that the authors have continued to update the book as they go. In that way, it might even be better than the print book.This is sort of a standard book on information retrieval. I've read maybe 6 chapters from it. Some are easier to understand than others.
  • Allen, Robert B. (in press) Information: A Fundamental Construct. Available from
    This book is new to me, but I enjoyed my class with Dr. Allen and I think there's a need for a general intro to LIS book.

Note: I had this post 90% done a few weeks ago - but my computer died.

New Google language features for off-the-cuff cross-language information retrieval

Cross-language information retrieval is an important research area with lots of activity. There are all kinds of elaborate algorithms and ways of doing it. There's a lot of domain specificity and connotation kind of things that have been really improved in the past decade.
Most people searching won't really have the support of the fancy specialized tools. I've approximated some of this searching for years using various basic search engine language tools. Luckily, recently they've added a lot more Chinese, Japanese, Persian, and Russian translation options in addition to the Western or Romanized languages. They've also become a lot more sophisticated.
Typically these sites offer to translate a word, a passage, or the page at a URL. Google now offers alternate meanings for a word (cool!) and also will translate a search.
For alternate meanings here's an example:

Translating the search is even cooler:

Update: you can find this stuff on the main page of google next to the search box. Here's a direct link:

Free (to you) research databases

Some of these are better than others. Some don't have nice controlled vocabularies and are a bit wonky in the free version.  Nearly all of them you can get through another interface for a fee if you need more precision in searching or to export your results. (oh, as an aside - you've got the database producer who puts the whole thing together, and then you have options for interfaces. For example, for Inspec, you can pay for access via STN, DIALOG, Web of Knowledge, EbscoHost, Engineering Village - used to be FirstSearch and Ovid, too, but I don't remember if they're still offering it. Librarians, when selecting these, have to pick databases and then interfaces based on functionality, cost, etc.). Some research databases we pay for don't actually have controlled vocabularies ( like Web of Science and Scopus) but have other features that are useful.

  • ADS - the Astrophysics Data System - This is a great database that does a lot of cool stuff with linking to data and stuff, but it doesn't have a controlled vocabulary. It does have some full text, though. Actually a lot of full text.
  • Ageline -This is from AARP and it's on all sorts of topics about aging from economic and social to health issues. It's got a cool thesaurus and a new interface.
  • Agricola - Talk about a mess of an interface. Even the folks at NAL pay a vendor to provide an interface. Funding for this has been up and down, too. It covers all areas related to agriculture. It has a controlled vocabulary.
  • Citeseer & CSB- (I suppose you could put these in here, if you really want to)
  • ERIC - This is done by the the Department of Education. Funding's been a little uneven (or a lot uneven)
  • National Criminal Justice Reference Service - From DOJ Office of Justice Program.
  • INIS - This is from the IAEA and it's got all aspects of science and related to the peaceful use of nuclear technology. Nice controlled vocabulary. Goes back to the 70s. Pretty cool. Make that very cool.
  • PubMed - Everyone knows about this. Don't forget it also includes veterinary stuff, too.
  • TRIS - From the National Transportation Library and the TRB (part of the National Academies). Covers all sorts of stuff related to transportation.


Oh, and there are technical report digital libraries from the government that have full text and also have controlled vocabularies. And places to get data and maps and stuff. We'll leave those for future posts.

I'm sure there are probably some other non-profits or governmental organizations with freely available research databases. If I've forgotten any big ones, please let me know.  Don't forget, too, that lots of research databases are free to you from your local public library, your workplace, or your school.

update: 2/8/2010 Ageline was purchased from AARP by Ebsco. Ebsco has now closed access - put it behind a pay wall.

Finding information on a topic

Previously, I had a post about finding information in books using things like Google Book Search. This post talks about finding information on a topic, or more specifically, why you should start your search with a research database and more about what research databases are (like the real ones). In a post coming up, I'll give some information on some free to you research databases (the real ones).

You should start your search with a research database to be more comprehensive, to cover multiple sources and publishers, to have real searching power/precision, and because of the vocabulary problem.

We know what research databases cover - they tell us the list of conferences and journals and the years covered, unlike the mystery meat that is Google Scholar. The biggest of the databases, and the most incredibly expensive (well into the 6 figures for an R1 in the US for a single database with a limited number of concurrent users), are pretty much comprehensive in coverage. Take Chemical Abstracts, for example. Pretty much anything you'd want in chemistry. Same with PubMed which is free to you. Even if you're talking about a much smaller database like Aerospace & High Technology, it covers multiple publishers' stuff, government stuff, and stuff in a bunch of different languages. Oh, and the different languages - well they're still indexed in English. If you decide you need it, well then you'll need to find a translation or just deal with the pictures and equations.

If you go directly to a publisher's digital library, IEEE Xplore or Science Direct, for example, instead of Inspec or Compendex, you'll miss things from the other and from Springer, or SPIE or OSA or ACM or any other publisher. And you won't have much power in searching/

Let's talk about power. Analytical abstracts lets you say if the chemical you're looking for is an analyte or matrix (very handy!). Inspec lets you look up frequency ranges (numerical indexing). BIOSIS lets you look up taxonomic terms (like kingdom, phylum, class, ...). A bunch of them let you look for a treatment - application, theoretical, etc. You can be very precise.

The vocabulary problem is basically that different people use different terms to talk about the same thing. Sure, if you work in the area you'll know the difference in connotation between lidar, ladar, and optical radar, for example. But really, articles about any of these might be useful. If you use Google or do natural language searching in another digital library, you'll need to OR all of these variations. Can you think of all of the variations? What about British spelling or American spelling?Real databases pick a preferred term - not one that is better, but pick one to stand for the concept. So you can find out what this term is and pull up all the articles on that concept, even if your word doesn't appear in the title or abstract.

When I say pick one, I mean specialists spend time coming up with a controlled vocabulary and decide what terms are preferred (because they're most common or whatever). These are also arranged hierarchically so you can explode a search which means you can search for anything under that higher level topic. You can also find related terms.... Oh, when I say pick one, I also mean that most of the research databases have human indexers. It's probably machine-aided (machine suggests terms), but there are humans writing the rules and doing quality control. Humans who ask themselves: what questions might this article answer? what queries does this answer? how can I best describe this content so that it can be found by people who need it?

Ok, now you're sold. How do you find one of these bad boys? If you're in biomed (which it seems like everyone online is sometimes), then just use PubMed and supplement it if you're at a research institution with Embase, BIOSIS, Chem Abstracts, or if you're in biomedical engineering with Inspec and Compendex.  For the rest of the world, check out the recommended databases in the subject area on a library's research guides. Pick a library! Or, you can look at the descriptions on DIALOG or STN - they have transactional pricing for access to databases.

I'll tell you about some free ones in an upcoming post.

PS - I almost forgot my example: flutter. Look it up in an aerospace database if you want it for missiles, in a biomed database if you want hearts, and who knows what you'll get in Google!

Finding Information in Books

I've talked about this a bit at sessions I taught at my library and also at Web Search University but it's still a favorite.  Plus, you asked for posts on finding information. Oh, and one of the tools just released some updates so this is fairly timely.

This is not how to use the catalog to see if a book you want is available in your library and to get a shelf location!  Also not about finding something good to read (frankly, I'm completely out of practice with reader's advisory, so can't help you there). Books are useful containers for information, data, and stuff you need to make new science, do cool engineering, and answer whatever questions. What kinds of things might a book work best for?

  • facts - physical properties (like a handbook)
  • equations
  • overviews/explanations/introductions
  • in-depth treatments
  • not for cutting edge, mostly

In the past, you had to rely on subject headings and pathfinders to locate potential books and then check the table of contents and indexes to see if your topic is covered. With all of the digitization projects combined with ebooks, there are a lot more options.

Google Books

Google has gone to libraries and scanned books as well as forming partnerships with publishers to get preview copies.  Interestingly, this includes old bound volumes of journals and government technical reports.  You might just be able to get away with the search and find the information you need in snippets.  See for example, this search to find the refractive index of indium tin oxide.  At some point I'll do a post about searching for chemistry by OR-ing different ways of representing a chemical in text (ITO is more common in the relevant field than InO2Sn or InSnO2). You can see some answers already in the results.  Once you find the book, if you can't read your answer right off, click to "Find in a library" and see if you can maybe even access an electronic copy from home using your library login.

You can use the book overview page to find books that cite this book, reviews, and the table of contents. In the most recent update, they added ways for you to embed this content in your page.


You can use Amazon's search inside the book feature to find information. These books will mostly be newer. This used to work much better than it does now. You can search within a book and see "text stats" like readability. & HathiTrust

The Internet Archive also hosts full text books online, including a copy of the Biodiversity Heritage Digital Library. These books are typically out of copyright (older than 1923 in the US)  In general, these are probably less likely to be useful for most scientists so I wouldn't start there unless that's what you need. HathiTrust compiles the library copies of Google Book scans. Lots of sci-tech here, but the search interface isn't there yet.

Federated Searches (subscription required)

If you're at a fairly large institution, you might have a federated search tool - this would be something that searches across multiple library databases.  A popular product is MetaLib but there are a few of these.  Where I work, we can cross search a huge collection of engineering handbooks from CRC press, McGraw-Hill, Morgan and Claypool, and others. Your mileage may vary

Individual Subscriptions (and pay per view) to Collections of Ebooks

Ebooks are the wild, wild west right now with respect to format and use and licensing. Your local library most likely has some subscriptions that allow you to search across various books. In Safari, for example, you can search for code snippets across books.  CRC Press and Knovel also both let you sort tables of properties from books and even do sub-structure searching.

