In which Christina goes into the weeds, yet not really thoroughly enough... anyhoo.
So MPOW is approaching an anniversary and we're looking at retrospectives of all sorts. What are the top articles we've published in the literature? What do you mean by top? Ok, so let's say that top means most cited - just for argument's sake. Is it really fair to compare a biomed article to an aerospace engineering article? An article published last week (ok, if in a special issue it might come complete with citations attached) with one published 5 years ago? 10? 20? Review articles with .... you see where this is going.
I had thought to normalize by 5 or 10 year periods and use the subject categories in WoS. But... 1) there are a lot of them 2) they overlap 3) argh. And things like Acoustics, for example. JASA covers biomed like hearing stuff and it covers underwater sound... but they're not cited the same... at...all. The acoustics category covers medical journals, physics journals, and maybe some math and engineering (I'd have to look again to be sure).
At the same time, the nice folks there on SIGMETRICS had a argument starting last weekend and going through the beginning of the week on various normalization schemes. One of the complaints against the impact factor is that it's an average and averages don't work on skewed distributions. And the WoS categories suck.
So... what I'm trying to do now is both fractional counting (and I'm checking to make sure I know what that is, but I think you don't get credit for 1 citation you get credit for 1/(total things cited by citing article) so like a citation from a review article is worth a lot less than one from a regular article because it may be like +1/200 vs. +1/30). And then I'm normalizing by percentile. Not even normal percentile but this Hazen(1914) percentile. Tricky.
I'll be sure to share the script once I've got it. So far the method looks like:
- Find my org, relevant time period, articles only in WoS.
- Sort by cited, pull off the most cited or all the ones cited more than x or something. Save them down in plain text full record (probably don't need citations?)
- Then for each of the top, click on Times Cited. Export them all down in Tab del Windows UTF-8
- Move them over to data folder
- Run R script (to be shared when I'm sure it's right) to get the new TCs and stick them into the file from 2
*note: if your thingy was cited more than 500 times, you can't export them all at once. Also this would not be practical if you have someone with like thousands of citations. If you do, I would just take the plunge and call that one of the best. We only had 5 over 500.
Next, I'll put them into the ISI.exe script and then the i3 script from here. See what happens.
As for normalizing by year. I was thinking about maybe omitting a couple of years or so and then doing 5 year bins 3 times and then doing 10 year bins. Not sure. Willing to take advice. It's a 75 year history, but there was a similar paper done in 1986 so I only agreed to go back to 1980. Before a certain time - no longer necessarily 1973 - the affiliation/address aren't there. One very nice retiree I had the pleasure to meet just died and I found that he was listed in Garfield's top cited articles. His work on polar gases is not coming up in the search so it's definitely not complete that far back.