Actually 2 - I'm on a tear. Using what I've learned for work!
pretty cool, huh? and useful. Found on StackOverflow. Other ideas there, too.
Opening an Excel file. Now what you're probably thinking is: "just save as csv or tab delimited and open that the normal way"... well yes, but I've got like 50 files and they were exports from the database as .xls and I didn't open them each while exporting to re-save.
So opening is one thing but I need a couple of columns out of the first sheet and there's some junk at the top that I don't need.
Package is xlsx (pdf). I had just installed Python (thought I had it at work, but I guess not) and was preparing to do it there, but then I was thinking that surely R would have a way.
The issue in my case with my work computer is that it's a 64-bit, but I have 32-bit java, 32-bit office, etc. I was running R in RStudio as 64-bit. I tried to do everything to get Java updated and 32-bit (but still not killing other things I needed). Finally - duh- just pointed RStudio at the 32-bit version of R installed and then RJava ran just peachy and it's required for xlsx.
Here's the command I'm using so far:
file<-read.xlsx("analysis.xls",1,startRow=4, colClasses=c("character", "numeric"))
1 is the sheet index. The colClasses is pretty slick, too. You can also name sheets and only grab certain columns.
So now to iterate through all the files in a directory, opening, counting up the articles listed for the categories I have in another vector and reporting that out. Should be cool. Or not. And I promised to have the results - however they're done - Friday and I'm going to be out tomorrow. Back to it!