Rant on: Why are citation exports so hard?

Aug 20 2009 Published by under Off Topic

I've used a citation managers/ bibliographic managers for years - ever since I came to my current job in which I do in-depth literature searching for scientists and engineers. I report my results as an annotated bibliography with analysis. To do this I need to search a bunch of research databases and compile the results and then export them into a bibliography in a useful and attractive format.


Also, I do information analysis and I compile the citations before I do some things (see my series on the same).



Seriously. Why?

  • You select things to export and go to the next page and they aren't still selected (IEEE Xplore).
  • Half the time the journal name doesn't export (Scopus- maybe fixed now),
  • no page numbers or volume numbers export (Scitopia - BY DESIGN??!?!),
  • it says direct export, but then it shows it on the screen (AIP Publishing!),
  • like 8 digit publication years, "DOI:" in the doi field,  conf papers published in journals come in as generic with nothing in the right place (Inspec on EbscoHost),
  • accession number in User field #1 (which we use for something in one shared account I have - EngineeringVillage)
  • you have to save the file down instead of direct export because you're using a product that competes with their product (Web of Science)
  • all kinds of freaking weird stuff (DIALOG format 4)
  • INIS - conf proceeding editors in the author field( in file 103 in DIALOG - almost delivered a product with co-authorships that never happened!)


I just want to pick my items, keep them until I'm ready, and hit a button and have them show up in my account. No editing required. Engineering Village is close, but I really don't understand why it's so hard. Ebscohost breaks its export to Refworks like every 3 months - they change some aspect of the data export without telling anyone, a hundred of us scream on the administrator's list, Refworks rushes through a fix, tests it, and releases it within hours.  I guess it's too hard for people to consider the consequences of changing their data formats before they do.

Ok, rant over.

  • Joe says:

    The "DOI:" in the doi field cracks me up every time! How hard can it be to parse that out?
    Another one to add to your list is SpringerLink (intentionally?) reversing first and last names on exports to EndNote. For multi-author papers it is a real pain-in-the...

  • Christina Pikas says:

    Oh SpringerLink is AFU. and I mean that sincerely. I guess I was trying to block the memory when I wrote the post!

  • Suelibrarian says:

    Please let me add patent information from CSA Illumina NOT showing up unless the client picks "full record". So they get a record with no source or year fields then wonder what they have done.

  • Heraclides says:

    @1: Regards parsing the DOI line from the text, I've looked into this a little some time ago when testing BibDesk and a few other bibliographical applications, and it isn't as trivial as it first sounds because of all the little variations. Some journals drop the colon. Sometimes it's broken over more than one line. Sometimes there is more than one DOI in the PDF, especially if the papers start in the middle of pages, rather than only occupying whole pages to themselves. Some embed the thing within an URL. And so on.
    I believe what is really is needed is a DOI tag within the PDF itself, rather than have the software parse the text, just as you'd expect in an XML-based format. (Someone will no doubt now tell me that, that's already been done!!)

  • Christina Pikas says:

    Oh - it's not that hard! Remember I'm talking about exporting from a database - everything is tagged in the database and it's consistently exported the same way. (I can and do do a global edit for these records to remove it... why should I have to?)