Archive for the 'finding information' category

Notes from Dan Russell Advanced Skills for Investigative Searching

This class was held 4/15/2016 at University of Maryland at the Journalism School, hosted by the Future of Information Alliance. Some information is here. Slides are here. Updated Tip Sheet is here.

I've previously taken his MOOC and enjoyed tips on his blog but things change so quickly it was good to get an update.

Of course I didn't bring my laptop so... these are from handwritten notes.

  • Capitalization doesn't matter except for OR when it's crucial. don't use AND it doesn't do anything.
  • Diacriticals do matter. e and é are basically interchangeable but a and å are not. (it does offend native speakers of countries that use these....)
  • If you need to search for emoji you'll have to use Baidu. This is relevant searching for businesses in Japan, for example
  • filetype:   works for any extension. If you're looking for datasets you may use filetype:csv . Regular google searches don't search docs, you'll need to search them separately
  • site:  it's different if you use nyc.gov, www.nyc.gov, or .nyc.gov . To be most general use site:.nyc.gov that . after the : acts like a * if there are subdomains
  • There is no NOT. Instead use -<term>.  No space between the minus and the term.
  • Synonyms are automatic. Use quotes around a single term to search it verbatim (also turns off spell check for that term). If quotes are around a phrase, it does not do a verbatim search.
  • There are no stop words
  • inurl:   ... this is useful if pages have a certain format like profile pages on Google Plus
  • If you want to get an advanced search screen. Click on the gear to select it. Gear is in the upper right hand corner. That's the only way to get limiting by region (region limiting isn't always domain), number search, language search. Some advanced search things can also be gotten by using dropdown boxes after searching or using things like inurl: filetype:
  • related:<url> gets you sites with term overlap (not linking/linked similarity).
  • Google custom search engine  - lets you basically OR a bunch of site: searches to always search across them.

Image Search

  • Tabs across the top of results for topic clusters found
  • Search by image - click on camera and then point to or upload image. Can drag an image in or control click on an image. After search can then add in terms to narrow to domain.
  • Example - find a tool in the basement, take a picture on a white background with it in a normal orientation, then search to find it in catalogs, etc.
  • Crop images to the salient bit.
  • On mobile devices the standard search is actually a google appliance search - not as powerful. Open chrome and search from there if you need more.

Other notes

  • Things are changing all the time because of adversarial arrangements with optimization people.
  • link:   was removed this week.
  • results are an estimate. When you narrow you sometimes get more results because it starts by searching only the first tier of resources. First tier has millions of results in it - and the ones that have been assessed as highest quality. If it doesn't find enough in the first tier - like when you narrow a lot - it will bump down to the second tier with like billions more results
  • consider using alerts.
  • to find any of these services - just Google for them
  • google trends is interesting. can narrow by time or region. Also look for suggestions when searching. Can search for an entity or for search term. remember trends are worldwide
  • Google correlate - example: Spanish tourism authorities want to know what UK tourists are looking for. Find the search for Spain and tourism, and see what keywords use by UK searchers correlate.
  • Country versions are more than just languages. Consider using a different country version to get a different point of view.
  • Wikipedia country versions are useful for national heros and also controversial subjects (example: Armenian genocide)
  • define   (apparently no : needed)

I think all librarians should probably take his class. Good stuff.

No responses yet

Thoughts on alternatives to Sci-Hub

There have been a lot of blog posts, news pieces, and listserv comments about Sci-Hub. Some have said that while they know it is wrong, they feel scientists have been forced into using the system because they have no alternatives for access. Some responses have been on the order of: we asked our favorite scientists at big US research institutions and they say they have access to everything they need so why don't you? or We give away articles to the very poorest of countries (who might not even be able to take advantage of because poor connectivity), so that should be enough (what about the middle range countries?). Or you make a lot of money and your university has an endowment, you surely can afford this journal and you're just stealing! Or Jean Valjean didn't have access to bread, either, but that didn't mean stealing was right!

Others have repeatedly countered the whole difference between stealing things (bread) and making copies that do not diminish the original (if possibly the market for it).

Anyhoo, what I really want to talk about here is the alternatives for closed access articles. Probably not an exhaustive list.

  • licensed access through your institution as part of a site-wide subscription (on campus, or via VPN/proxy from off)
  • interlibrary loan
  • license your own copy ($30-75)
  • individual subscription (through a society or just from the publisher)
  • "rent" access to view a copy for 24 hours
  • find copies self archived in institutional and disciplinary repositories, on their websites, and other random places
  • find copies illegally shared as part of course materials for another course (this happens for stuff I'm looking for pretty regularly, actually, particularly chapters from social sciences books)
  • contact author for copy
  • contact buddy, relative, etc., at another university to request
  • use walk up access at a local public university
  • use #icanhazpdf
  • use Sci-Hub

So let's look at hassle factor. Part of what goes into figuring out the hassle factor is how you identified the article in the first place and what network you're on.

At MPOW if you use Google Scholar or PubMed and you're on our network, you should be able to go right to the full text for the majority of things you're looking for because we have a lot of subscriptions. We have our IPs registered with Google so it points to our subscriptions and our link resolver. If you use our link resolver, it fills out the ILL form for you from there. Still, it is more convenient to get a pdf from/through Google than wait for ILL or for us to scan and e-mail you something.

What if you're off campus? A quick check of #icanhazpdf showed some people were asking because they were off campus. That, to me, seems like the height of laziness and inconsideration. Does their campus really have no remote access? The person who is sending it to you has to go through more effort than it would take you to VPN or use EZProxy.

One commentator heard from someone who does have access at work but couldn't be assed to use the library tools to locate it. Really? So the search on Sci-Hub doesn't work (I'm told) so the best way to use it is through the doi. I can put the doi of an article into my FindIt tool and get a proxied link to the best source for full text immediately - even if it's at a 3rd party aggregator. Legally. I can also put the PMID in. In fact, I have a plugin in my browser that automatically links the DOI to my link resolver.

Ok, so you may not be at an organization that has all this set up. There are lots of industrial and government scientists who have very little access to the literature. Even if they do have access, they might not have the connecting tools.

In many places ILL is awful. Let's be quite honest. Another form. Asks a lot of information. May have a different login. May take 2-3 weeks to arrive. It may be fax quality. May be a cost associated. In one sociology class I was in as a student they were going off on how bad it was: wrong article, missed several pages, illegible copies... the one guy put his request in like 5 times before getting a full, readable copy. He kept putting it in after a while to see how many tries it would take! Your buddies on Twitter do not have to print, scan to fax quality, and then send.

I love how people say you can use your local publib. Mine is not going to ill for scholarly articles for you. They don't have that kind of budget or staff. I think it's getting harder to use walk up access, too. If you have eduroam you can get on the network but if you're at a local small business? It's not like when the journals were in print.

I don't even know where I was going with this but to say that #icanhaspdf has a point. Library systems need to get easier and get in the workflow, but also scholars might actually need to put some effort in to learn to do things the right way.

4 responses so far

Do too - I'll show YOU!

Mar 15 2015 Published by under finding information

Lookin' for some lit as one does when one is supposed to be writing instead of adding to the impossible list of things to double back to add to the lit review... Via who-cited-who and Scholar, ended up on a TandF page. Looked interesting - right click, reload through proxy for my place of work. It sneers - "sorry you do not have access to this article" - access options include paying $40 for the article. Um. No.  LibX has kindly highlighted the doi... clicked... got to my beautifully customized SFX page (with Umlaut) and it's full text on a major aggregator. Take that you! Ha!

And, this is probably even better than seeing it at the publisher, because our custom FindIt page tells me the article has been cited 23 times (oh well maybe not I see that TandF does offer that info, too).

Sadly though, I'll bet hardly anyone at my place of work would have thought to take this path.

 

Edited to add: moments later looking at JSTOR. They kindly ask if I think I should have access... then let me pick my institution and do a shibboleth login et voila. (price would have been $14 without).

Comments are off for this post

Continuing value and viability of specialized research databases

Nov 26 2014 Published by under finding information, Information Science

There was an interesting thread yesterday on the PAMnet listserv regarding "core" databases in Mathematics and which could be cut to save money.

One response was that it's better to search full text anyway (I couldn't disagree more).

Ben Wagner expressed concern that Google Scholar was going to drive all of the databases out of business and then Google would abandon the project.

Joe Hourclé posted about ADS - a core database in astro. Fred Stoss posted about PubMed - needs no intro here, surely!

Here's my response.

I think Scopus and WoS are the biggest immediate threats to the smaller domain specific indexes particularly when the largest number of academic users are looking for a few reasonable things and aren't doing the complex queries or needing to be very precise and have very high recall. In my world, I'm like the goalie: by the time they ask me, they've tried Google, they've asked their friends, they've asked their mother*... it's gotten past 10 people without an adequate answer. For these hard questions, I need the power of a good database (like Inspec). But... if you look at quantities and numbers of users... does that justify the huge cost? Maybe? But do our auditors agree? Infrequent big wins vs. day to day common usage?

As Ben has often chronicled, we've shifted money out of every other budget to support our sci/tech journal habit. We've starved the humanities. We've dropped databases. All for more and more expensive journals. Seems like if the content does get paid for out of other budgets via page charges or institutional support for open access publishing, that might make it even more important that libraries have better ways to find the distributed content. But, like Ben, I worry that we'll put these finding tools out of business.

Another observation: two of the "core" databases mentioned, ADS and PubMed, are government supported as a service to the community. The solar physics bibliography is a very specialized resource but is also super important to those researchers. Maybe if building specialty research databases is no longer profitable but there remains a need, the community-built tools will improve/grow/gain support? Maybe they'll be backwards and using technology from 1995, though 🙂

I'm working with some projects that are actually taking big piles of full text documents and using computational methods to classify using an ontology that's built by subject matter experts (with some advice from a professional taxonomist in my group). The volume/velocity/yadda yadda of the data precludes the careful indexing done by our fancy databases... but this and other projects like it I think show a swing back toward the importance of good indexing and the importance of having domain experts reviewing the classification system.

 

* My mom is a statistician so I might ask her first

 

Comments are off for this post

Searching Scopus by Date Added to the Database

In my previous post, I complained that my metrics weren't comparable over the course of a few months, even for articles published in 2009.

I looked in the instructions, and I couldn't find anything that discussed searching by date added to the database. I looked at all the fields on the detailed view and there wasn't anything to help. No accession number. No date added. Hmph.

So I started to think about the alerts I had set up.When you click through "view all new results in Scopus", you get a search like so:

(AFFIL((my place of work)) AND ORIG-LOAD-DATE AFT 1390059048 AND ORIG-LOAD-DATE BEF 1390674349
Huh. So I wondered... can you just find the right AFT and search in advanced search for that?  Yup. Sure can!
What are these crazy numbers though? (most people will know right away - I didn't, and I should have). So I looked around - no I didn't have any from that time period to use. I chatted with the Scopus help and they insisted 1) can't search on that field (I told them I already proved you could) 2) it was part of the alert system and not part of the database (????) 3) they couldn't give me the numbers for the time period I want, because you can't search for them anyway.
So then I asked LSW and the brilliant Deborah and as brilliant but time delayed Meg told me it was Unix time - seconds since 1/1/1970.  Stephanie also provided me with a search string from early January (thank you!). I read about that in R, but Deborah even linked to an online converter and boom - Bob's your uncle.
So, if you want to find articles added to the database before or after a certain time, convert the time to Unix time and then use
ORIG-LOAD-DATE AFT
or
ORIG-LOAD-DATE BEF

adding 5/7: I was contacted by Scopus - I would like to post detailed information from the e-mail but haven't gotten permission. She did verify that this search will work, but only so far back. That information isn't kept indefinitely. Also, you can use RECENT(n) where (n) is the number of days. You can AND that on to any advanced search.

Comments are off for this post

Searching for textbooks in a library catalog

Feb 14 2014 Published by under finding information

So this should be easy/obvious, but it's not so much ... at least to me. It's very common that people want to locate textbooks for diverse reasons including:

  • they want to get it from a library because they don't want to pay for it (grrr...)
  • they are thinking about taking the class and they want to see what's involved
  • they want to get background/overview information on a field to learn a new field or brush up on old knowledge
  • they need to reference a basic idea from their field

For the first bullet, they should have the exact title, author, publisher, year, etc., so it's a standard known item search (which might also be complicated*). For the second, we or they can go to the syllabus of a class and get the book information or even go to the bookstore page to look up the book information.

For the last two, however, it's more of a subject search and then narrowing for format. It's not format in a straight forward sense like CD or DVD, it's the format of the content inside the container.** Librarians have some tricks in searching for textbooks including combinations of the following:

  • One word titles like Microbiology or Physics
  • Foundations of... or Introduction to... in the title
  • Using [topic] -- textbooks (not as effective as you'd like)
  • Looking up publishers that do a lot of textbook publishing (e.g., Pearson)

We at my larger institution were asked if we could add a facet in for textbooks. Or could we try and how hard would that be. So we're looking at these tools and how we could write an algorithm to tag various records so they could be brought up in search. How accurate do we need to be? Should we say it has to have one or more of the above categories? Ideally, we could also get multiple semester's adoptions from the bookstore or whatever, but we also want to know what other institutions use for textbooks. Some vendors (like Elsevier and maybe Springer, EBL) put textbook-type ebooks on a different platform or on the same platform but with a different license. Could we somehow get those lists and use them to tag items? How do we maintain as new items are purchased or licensed? Maybe YBP or whoever has a mark for that? Could we get that information for items we've already purchased over the last few years?

Ugh. Searching for this is difficult because all I keep finding are messages of varying crankiness levels about how students need to buy their own damn textbooks and not count on getting them from a or any library!  Ideas?

* Lee, J. H., Renear, A., & Smith, L. C. (2006). Known-Item Search: Variations on a Concept. Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST), Austin, TX.

and the Kilgour series from JASIST c2001-2004

but: Buckland, M. K. (1979). On Types of Search and the Allocation of Library Resources. Journal of the American Society for Information Science, 30, 143-147.

** FWIW, we don't even do this uniformly with conferences. Some are "books" and some are conference proceedings in our catalog and some LNCS are monographic series so they come up as periodicals. And there are conference proceedings printed in journals...

2 responses so far

InfoDocket a new place for updates in all that's happening in the info* world

Feb 19 2011 Published by under finding information

In case you missed the announcement, the fabulous Gary Price (a national treasure, as my boss says) and Shirl Kennedy have left ResourceShelf, and started up a new site: InfoDocket. Like ResourceShelf, this is an update on all that's happening in the information world - with our vendors, what libraries are doing, what governments are doing related to information and libraries, what search engines are doing, etc. No newsletter yet, but they will be developing one later.

They also have a new site to list full text reports that are now available (http://fulltextreports.com/). Compare this to DocuTicker. It has reports from government and non-governmental organizations on all sorts of topics.

I highly recommend both of these sites to anyone working in this world.

(not affiliated, yadda yadda, although I consider Gary a friend)

Comments are off for this post

Well, sometimes you just have to Google it

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work with one chemist quite a bit. Finally I gave up and googled it. After a few tries, I found way down in the results an article about something else (like I needed a chemical in an aqueous solution and it had the chemical in alcohol), but the snippet drew my eye. Sure enough - had a table with my data in it. An ACS journal from 1945.

The data I needed were not the focus of the paper - they were there sort of as a calibration or reference type thingy - to show what the setup would do with no alcohol. So it's absolutely right that the document wouldn't have come up in my search, because technically the article didn't match. That's why the full text search worked.

It could be that I could locate the info using SpringerImages (but it's an ACS article) or using CSA's deep indexing (is illustrata still around? I did try Aerospace & High Tech).  Lesson learned.

8 responses so far

scio2010: What can librarians do for scientists

This is a session by Stephanie Willen Brown and Dorothea Salo .

They started with a bunch of questions. About half the room was librarians, of the others split between affiliated with an institution and not. Where do you go for full text? Google, Google Scholar. Does that work? Sometimes - if not quick if not free to me then move on.

See if your state library has research databases - like NClive, iConn. Contact one of us and we'll put you in contact with someone local.

Come ask your librarian if you need help with anything - even if they don't already provide that service, you help them with ammunition to take to their bosses to start the service and/or can be a guinea pig to test a beta service.

A last thought - needs to be easier to add things to repositories.

Comments are off for this post

Free IS and CS books online

Nov 03 2009 Published by under finding information, Information Science

One of the great things about my interests overlapping computer science is that computer scientists believe in self archiving and making their work freely available on the web. The scientometric parts of IS are that way, too, but the L of the LIS... well, that's just sad (except for Dorothea, her stuff is available). I still hope to write a review of one of these books because I'm really enjoying it. Here are a few:

  • Hearst, Marti (2009). Search User Interfaces. Cambridge University Press. Available from: http://searchuserinterfaces.com/book/.
    Sure there are lots of books on information retrieval, search engines, interface design, and information architecture. This book is about designing the interaction required for good searching. There is more to it. I'm about a third the way through reading this book and it's excellent so far. She cites references for each point she makes and that makes me happy. I actually plan to buy a print copy at some point although it's really cool how you can mouseover the citations in the online version and it shows you the whole citation - you don't have to click to the bottom of the page or click through.
  • Easley, David and Kleinberg, Jon (in press)  Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. Available from: http://www.cs.cornell.edu/home/kleinber/networks-book/ (in pdf per chapter or entire book).
    You might say, oh another book on networks, sigh, but Kleinberg is a leader in that area and this book grew out of a course he's taught up there. I'm not as familiar with the markets part so I plan to browse those sections.
  • Manning,Christopher D., Raghavan, Prabhakar, and Schütze,Hinrich(2008)  Introduction to Information Retrieval. Cambridge University Press. Available from: http://nlp.stanford.edu/IR-book/information-retrieval-book.html 
    One kind of cool thing about this site is that the authors have continued to update the book as they go. In that way, it might even be better than the print book.This is sort of a standard book on information retrieval. I've read maybe 6 chapters from it. Some are easier to understand than others.
  • Allen, Robert B. (in press) Information: A Fundamental Construct. Available from http://www.cis.drexel.edu/faculty/ballen/ISS/index.html
    This book is new to me, but I enjoyed my class with Dr. Allen and I think there's a need for a general intro to LIS book.

Note: I had this post 90% done a few weeks ago - but my computer died.

Comments are off for this post

Older posts »