Archive for the 'Information Science' category

Providing real, useful intellectual access to reference materials from current library pages

Those of use who study or teach about scientific information have this model of how it goes:



(this image is lifted from my dissertation, fwiw, and it more or less reproduces Garvey & Griffith, 1967; Garvey & Griffith, 1972)

Conference papers (in many fields) are supposed to be more cutting edge - really understandable to people in the field with a deep understanding but who need that icing on the cake of what's new. Journal articles are for more or less after substantive parts of the work are complete and take a while for review and publication (letters journals are supposed to be much faster), and then monographs and textbooks are more for when the information is more stable. More recently, there's a category of shorter books that are sort of like extended reviews but are faster than monographs. Morgan & Claypool, Foundations and Trends, and the new series coming from Cambridge University Press (no endorsement here) are examples. (Note the model omits things like protocols, videos, and datasets).

Reference books are even slower moving. They are used to look up fairly stable information. Here are some examples:

  • encyclopedias (and not just Worldbook, but Kirk-Othmer, Ullman's, and technical encylopedias)
  • dictionaries
  • handbooks (not just for engineers!)
  • directories
  • gazetteers (well, maybe less so for the sciences), maps
  • guidebooks (like in geology, biology)
  • sometimes things like catalogs...

You may think, hey, all I really need are the journal articles and Google and maybe Wikipedia. Or at least publishers and librarians think you're thinking that. And reference books are sort of disappearing. It doesn't make any sense to devote precious real estate to the print versions and the online versions are super expensive and also often not used.

The thing is that these tools are really still needed and they have condensed very useful information down into small(er) packages. If you're concerned about efficiency and authority then starting with a reference book is probably a good idea if you want an overview or to look up a detail.

The publishers don't want to lose our money so they're taking a few different approaches. Some are making large topical digital libraries that combine journal articles, book chapters, and reference materials. This can be really good - you can look up information on a topic when you're reading a journal article or look up a definition, etc. You can start with an overview from an encyclopedia and then dive deeper to learn what's new. The problem from a librarian and user point of view is that the best information may come from multiple different publishers and you just won't get that. You won't get a recommendation for someone else's product.

Another thing publishers are doing is to make reference materials more dynamic. First, they can charge you more and and more frequently. Second, even if the updates are quite small, it makes the resource more attractive to potential users to have a recent date updated. One publisher in particular has commissioned sort of a portal approach that gathers materials from various places and has commissioned new overviews.

There's a tool to sort of search across more traditional reference materials, but... meh.

Of course if you have a well-developed model of what type of reference tool will have your needed information, then you can use the catalog (subjects like engineering - handbooks, engineering - encyclopedias). Back in the day, I wrote about how senior engineers gathered and created their own handbooks from pieces they'd found useful over time.

So here's where librarians come in. I've never taught the basic undergrad science welcome-to-the-library class (I attended one <cough> years ago), so I really don't know if they go over these distinctions or not. So that leaves our guides to try to get people to the best source of information. Guides that are merely laundry lists of tools by format/type are frowned upon because they are generally not useful. That's what we used to do though: here's a list of dictionaries, here's a list of encyclopedias... etc. What we try to do more now is make them problem based. Somewhat easier in like business: need to understand an industry? need to look up a company? Also maybe in materials science and or chemistry (although SciFinder and Reaxys' way of doing properties may be supplanting).

Ok, so beyond the difficulty of expressing the value of each of these tools and in which situations they are useful, we have the affordances of our websites and the tools that produce them. Most are database driven now, which makes sense because you don't want to have to go a million places to update a url. Except... one reference might be useful for one purpose in one guide, and another in another, and then how do you get that to display? How do you balance chatty to educate when needed verses quick links for when not?

Also, do you list a digital library collection of handbooks or, more commonly, monographs mixed with handbooks, as a database? As what?

The reviews and overviews and encyclopedias... do you call them out separately? By series?

Users sometimes happen upon reference books from web searches - but that's mostly things like encyclopedias. If they need an equation or a property... well, if they're an engineer they probably know exactly what handbook... so then, I guess, if they don't have their own copy, they would use the catalog and get to the current edition which we may have online. Getting a phase diagram or other property of a material - I'm guessing users would probably start online but for some materials we have entire references (like titanium, aluminum... and then things like hydrazine).

I'm thinking we could have on an engineering guide, a feed from the catalog with engineering - handbooks? Likewise a feed physics-handbooks?  What about things like encyclopedia of optics. Call out "major reference works" and then catalog feed of [subject] - handbooks|encyclopedias|etc....

OR.. hey... what about the shelf display model:

But, instead of all books, just the books for that guide that match [guide name] -- encyclopedia|dictionary|handbook, etc.

What other methods can we use?

No responses yet

ASIST2017: Information Use Papers

Ma Cui-Chang and CaoShu-Jin - Identifying structural genre conventions across academic web document for information use

Swales model for research articles

Move 1 Establishing a territory
Move 2 Establishing a niche
Move 3 Occupying the niche

rhetorical organization patterns - disciplines, different information uses

sources for development: rhetorical objectives of the genres > linguistic clues > move analysis, writing rules genre research

academic blog post, online encyclopedia, research articles

corpus - 81 documents, 2015, Chinese documents with kw "citation analysis"

raters - interrater reliability 80-100%

Taxonomy identified and validated

q: how will you use this? will you use machine or automated clustering based on this.

q: can you elaborate on information units you found on the web or in web documents vs. formal publications.

a: main difference in how organized. also Swales is developed from written English articles.

q: mentioned Swales was developed to help train junior users, could your taxonomy help further with teaching

 

Devendra Dilip Potnis (speaker), Kanchan Deosthali, Janine Pino - Investigating barriers to using information in electronic resources: a study with e-book users

Motivation: spend money on electronic resources, but they're underused. Goal: to investicate barriers to using information in ebooks

Key findings - 60 barriers. Categories:

  • ereaders (16)
  • features of ebooks (20)
  • psychological (7), somatic(3), cognitive status (6)
  • cost
  • policies

different actors - things about the users and things about the environment, system, vendors

uses Wilson (2000)'s definition of using information - both physically accessing, as well as mental schemas and emotional responses

4 broad stages of information use- searching, managing, processing, applying information

Lots of previous studies - their main difference is how they look at use of information instead of "value".

They did a survey of LIS students (n=25) [sigh... this is a real and important topic, but sigh]

These participants also might have more insight into use of information, what's going on in libraries, etc.

Great quotes - flipping pages waiting for a page to load - breaks concentration. Not immersive. Policies don't let download. Poor text quality.

Mapped barriers to information use stages.  For example psychological barriers prevent information processing. Technical barriers prevent use of information

"due to a series of unavoidable barriers, respondents who originally intended to use ebooks for utilitarian purposes end up using this electronic resource mostly for hedonistic reasons " (pleasure reading, but not reference)

contributions - insight into adoption, why a negative perception. also if hiring a new librarian, will they have a negative attitude toward ebooks.

q: plans to go bigger with this

a: not really - so disheartening [welcome to my world] - but is planning a bigger hci study

q/comment: need to really differentiate between scholarly and leisure reading and even within scholarly, engaged with as monograph vs no drm pdf per chapter engaged on a per-chapter basis almost as a journal article

q/c: some have advanced annotation and highlighting features of which users may be unaware

Ayoung Yoon - Role of Communication in Data Reuse

Secondary use of data - not for the original purpose, and generally not by original collector of data

not a simple one-step process, transfer of knowledge, "social process" interactions and communications with other relevant parties (Martin, 2014)

who are involved, why and how)

past studies - transferring information about context of data, difficult to know what contextual information is important for unknown possible reusers, level of skills and tacit knowledge of reuser

strategies - documentation (inherently insufficient, not everything can be transferred), communication with producers (formal or informal)

38 - quantitative data reusers in social work and public health. Identified from scholarly databases using "secondary data" or "secondary analysis"

not a linear process - discovery, selecting, understanding, analyses, manuscripts

purpose of interaction communication - searching, interacting, problem solving

search is complicated - no one place to look, data may be dated, rely on established network, have a "data talk"

interaction/communication - learning process, collaboration and mentoring process, "not just access to the data but more importantly, access to people", "how to get around challenges"

problem solving - "knowing other people who were closely working with the data" "talking among ourselves" give reusers "confidence" about solving issues. Also working with data professionals and statisticians "if the problem was really me or the data"

Limitation of communication around data - have to be part of the network to have information needed to access data - peripheral and junior researchers. Unsuccessful interaction with data producers (no answers, partial answers, busy, contact person may be project manager and many not know)

communication is not always necessary for reusers - if it's well documented, known, and the reuser is experienced.

important to support this communication around data - most libraries do not deal with this but deal with mandates and sharing.

q: communication around data among reusers - not with producers - role for platform to support?

a: extended (great) - she did see that in her work. lots of discussion at conferences and within networks among reusers. OTOH, some participants hit a wall when they didn't get a response from producer and didn't have anyone else to ask next. Library is not seen in facilitating this but would be helpful if they could. Platform facilitating could be useful, too.

 

No responses yet

ASIST2017: Social Media Papers session

Oct 30 2017 Published by under Conferences, Information Science

Fei Shu (speaker) & Stefanie Haustein - On the Citation Advantage of Tweeted Papers at the Journal Level

Previous research - twitter exposure leads to an overall increase of citations. Correlation is weak. Low social media impact in countries where Twitter, for example, is limited or blocked.

Research questions - compare normalized citation rate of articles shared on twitter with similar papers from the same year. 22% of WoS papers are tweeted? (talking fast!) This causes problems - so look at journal level, control for journal, discipline, country of origin author. Data Web of Science and Altmetric.com . Use the DOI to search both. In Altmetric can see where the tweets originate. They used thresholds to deal with outliers. Used tweets and citations from 2012 to 2015. Since there were some papers with very few tweeted papers, these would be difficult to compare. Used journals with at least 10 tweeted and 10 non-tweeted papers. ... also did threshold with 100 and 100 - in this case of 308 journals, 36% papers tweeted. Tweeted papers receive 68.4% more citations on average than non-tweeted (not corrected). Corrected by journal 30% citation advantage (significant at p<0.05). By discipline - varies - significant in 9 disciplines - not significant in chem, engr, human, math due to sample size. Source countries (based on author institution) - threshold level. Country with top tweeted - Netherlands. Sweden 91% citation advantage.

Citation advantage 30%, in all disciplines but extent varies.

Most tweets are from 6 months after publication

Chris Hubbles (speaker), David W. McDonald, & Jin Ha Lee - F#%@ That Noise: SoundCloud As (A-)Social Media?

SoundCloud is used to share and communicate about music. Has timestamped commenting and allows social interaction among fans woven into the playback feature. Used to distribute music, podcasts, and even some government organizations. "Social Multimedia" . Qualitative content analysis on these comments. Used search API (ID popular tracks) and then track API to pull all the comments for these tracks. Whole year of 2013. 100-200 tracks per day uploaded except for a weird spike. They removed from the sample spoken word. Kept 0-10 miutes, 10-500 comments. Collaboratively coded by authors. Codebook with 39 codes.  58 songs, 5,608 comments. 69% electronic music and hip-hop. Music was uploaded by artists, labels, promotion companies, fans, etc.  Comments were mostly positive. Were full of profanity, caps, emoji, exclamation points. But also about features of the music, and stories of where the music was heard and what it meant. Few of the comments were part of conversation threads. One track had 77 comments with no replies. Uploader replies were almost as common as fan replies.

Similar to what Dana Rotman found with YouTube. The presence of affordances doesn't mean will form community.

The display could be better to support participation.

"A-social party" - expression and not interaction. Broadcasting, graffiti, co-presence, mutually shared experience.

Quan Zhou, Chei Sian Lee (Speaker), & Sei-Ching Joanna Sin - Using Social Media in Formal Learning: Investigating Learning Strategies and Satisfaction

Self-regulated learning (Pintrich, 2000, p453) - "an active constructive process whereby learners set goals ... then monitor regulate... constrained by ...goals... environment". forethought, performance control, self-reflection

survey  - undergrad and grad students, if they used social media for any class, standard scales for learning strategies and satisfaction... n=270

PCA and regression. all 4 learning strategies significant. Goal setting most influential predictor of learning satisfaction. Self-evaluation second (social comparison - is a motivating force, unlike general studies of social media where comparison makes you unhappy). Keep in mind, their students are maybe more highly motivated than some other samples.

Limitations - didn't look at whether use was voluntary or mandatory. one university

q: how did you define social media? big list

q: did you ask for how the social media were used in the class? (no, not really?)

No responses yet

ASIST 2017: Digital Literacy in the Era of Fake News: Key Roles for Information Professionals

Oct 30 2017 Published by under Conferences, Information Science

They were having problems with the projector so started with Connaway going through studies they've done related to information literacy. Important to provost and universities - learning doesn't stop when students graduate. How do we get students to use public libraries and use information in every day life decisionmaking.

  • How do people who work with the public in libraries get updated on information literacy
  • What do students know about how search engines work
  • How do people assess information on the web and in social media

Heidi Julien - engage in issues and model approaches

  • social media campaign about facts
  • express views publicly and stand up to confront misinformation
  • educate representatives at all levels of government - these issues are important and institutions like library need to be supported
  • advocate for importance of digital literacy.
  • Aldous Huxley "facts do not cease to exist because they are ignored"
  • (other international infographics and things to share)

Seadle - information professionals provide context and nuanced view.

 

Alex Kasprak - Science Writer at Snopes.com

Some things they're seeing are more like overblown - Yellowstone volcano may erupt sooner... > we're all gonna die!

More things like autism/vaccination.

Deeper expose on retired scientist who is peddling snake oil cure for cancer.

"50 studies say... " - he's never found one of these in which the studies do support the claim

Recent from B saying 400 articles saying climate change a hoax. Kasprak asked author "how long did it take to prepare" and Delingpole said "as little time as possible" (can share this because Delingpole posted that an "impertinent pup" from Snopes was fact-checking him with this comment).

questions: debunking - is it really useful or is it just giving more attention? Snopes won't necessary solve the problem but serve as a reference and affect the financial viability of these sites. Real world implications when Snopes debunks.

is it really about believing things that are untrue or is it more taking away debate - yes

is the term "fake news" too charged now to use. yes - probably not a useful term

other terms: lazy journalism, hucksterism, pseudoscience, etc. use more precise term

more blame on producers - but can we increase the cost of being wrong (reposting these stories)

Habermas-ian - public sphere as a place for exchange of rational ideas - but with Foucalt hat on if our problem is trying to maintain this notion of civil society in the face of people who are no longer interested in the ideal. The rational argument against an emotional or financial gain... beat our head against the wall.

Seadle response - like both schools of thought. but people aren't rational. behavioral economics. pure number of hits on a website gets you more money. Incentive structures to bring people back to

Julien - we are beating our heads against the wall, multiple cognitive biases, all operating in our own echo chambers - ideal

My q: influence operations by state actors vs. this

Kasprak - the state actors were taking messages already existing or making new messages modeled on existing, and then amplifying, paying to target these. So combating is actually similar, but we're not winning, and there are higher numbers.

 

 

No responses yet

Brief notes from Maryland SLA's Storytelling with Data

This one-day meeting/course/workshop/seminar (?) was held at the University of Maryland (go Terps!) on October 12, 2017. As with all events planned by my local SLA chapter, it was very well organized and run. The speakers were all excellent. Amazingly, the parking was close and pre-paid. The food was great, too.

Keith Marzullo - the dean of the iSchool - gave some welcoming remarks. He was so positive and seemed to really get the point of the day.

The opening keynote was by Ya-Ling Lu from the National Institutes of Health library (not NLM but the campus library). I have mostly heard her speak tag-teaming with Chris Belter on bibliometrics techniques but it was wonderful to have the opportunity to hear a long presentation just by her on visualization. She talked about having a low floor - starting at the beginning - and a high ceiling - keep learning and improving.

She talked about learning design and how choices convey emotion and meaning. Her example was from Picture This: How Pictures Work by Molly Bang


  WorldCat link

It was amazing to see how simple rectangles and triangles, their color, size, and location really told the story.

She also provided examples of developing information products. The first was to celebrate the life and career of someone retiring. She needed data and visualizations and a story for people, research, and leadership.

A second example was graphing how she spends her day to try to find more time for the things she wants to do.

Finally, she skipped over an example of how she successfully fought a traffic ticket using data and visualizations.

Oh, and she often uses Excel for her visualizations - even when she can make them in R or Matlab.

 

Jessie Sigman from University of Maryland spoke next about using cytoscape and gephi to do graphs showing coverage of agricultural topics across research databases.

Vendor updates were provided by the sponsoring companies: Clarivate, Ebsco, and Cambridge University Press. CUP is doing a neat new thing that's sort of like Morgan & Claypool - it's like a monographic series, but the volumes are 40-70 pages. Peer reviewed and series are edited like journals.

David Durden and Joseph Koivisto of University of Maryland spoke next about the different stories that can be told with repository usage data. So it turns out that D-Space has separate data for the content (say PDF) and the metadata and integrating this mess to get a real, accurate picture of how the system is being used is a bit of a bitch. It's indexed by Solr, but Solr doesn't keep the same index number for the content - it assigns its own. Google Analytics does a lot, but maybe not the right things. RAMP, a project out of the University of Montana, helps with Google data but also has shortcomings. Things based on Google do the best they can to filter out bots. HOWEVER, if it's a bot a professor on campus wrote to analyze data, then that's a great use to track. Also Google doesn't capture the full text downloads.

 

Brynne Norton from NASA Goddard spoke of a cool visualization using interlibrary loan data. Standard statistics are just like time to get things filled and % requests filled. The data are horribly messy, with some citations lacking even an article title. She compiled the article titles using a series of regex searches and searched them through the Web of Science GUI. Yeah, the GUI. Apparently you can OR about 500 articles at a time! (as an aside: yes, there is indeed a WoS API, but you cannot use it for this purpose. You are only allowed to search for yourself. I know.) Then she loaded into VosViewer and did a topic map. It was really cool and she narrated how it showed certain areas they might consider collecting in.

 

Sally Gore did the closing keynote and boy is she awesome. I highly recommend librarians sign up for her webinar when SLA schedules it. She was also super encouraging. She spoke of how she figured out how to do these amazing infographics on her own - she even uses PowerPoint and sometimes draws her own icons. She recommended books by Stephanie Evergreen to learn design.  I have more notes, but they're at work and I'm trying to get this published - so I'll add if I find anything else I wanted to note

The closing remarks were actually terrible. The guy who gave them had not actually attended any of the day or really read the descriptions of the speakers. His comments were like on research data management which is irrelevant to the day's topic. Boo.

But then we drank wine and had some more food so it was ok 🙂

No responses yet

Special Libraries, Information Fluency, & Post-Truth

Lots of librarians are communicating online about providing resources and information in these fraught times. Some are creating displays and programs to support diverse populations. Others are crafting statements of support and/or opposition. More practically, some are stocking their reference desks with safety pins and extra scarves (to be used by women who have had their hijab snatched off).

But these activities are more useful, perhaps, in public or academic libraries.

In the past few days, "fake" news and propaganda has been receiving a lot of attention. (Hear a rundown from On The Media  - clearly a very liberal news source but transparent about it). As noted on this blog, it is not really possible to insist that our patrons/customers/users use only our licensed sources. To be quite honest, even if we could, that alone isn't enough to ensure they are getting the best information. We think that because our people all have at least college degrees that they are experts on or at least competent in critical thinking.

I think, though, that the media environment isn't what it was when many of them were in school. We take the click bait and we see headlines repeated so often on Facebook that maybe we start to believe?

So, now, how do special libraries train and support their organizations in the post-truth world? I have been asked and have accordingly scheduled training that discusses critically evaluating resources; however, that is NOT at all attractive to busy professionals. The only training I offer that is well-attended is problem oriented and is explicitly related to doing the scientific and technical work (no surprise here to my library school professors!). Otherwise, short on-point training at the point of need is also well-accepted.

Integrate aspects of critical thinking and evaluating resources into every bit of training you do. If your user base can qualify for a free web account for the Washington Post (.gov, .mil, & .edu), make that information available even if you provide access through another source.  Do show finding news information in other topical sessions. For example, a session on aerospace engineering could cover things like society news sources and Aviation Week.

If your organization has an internal newsletter and or social media site, link early and often to reputable sources.

Are you integrated into strategic processes (never as much as you would like, I know!)? What information is your leadership getting and from where? The very highest levels of your organization won't typically attend your classes - can you brief their assistants? Can you make this information available to their mobile devices?

No responses yet

Communications Theories - the continuing saga

Apr 16 2016 Published by under Information Science

The dissertation was accepted by the grad school and is on its way to the institutional repository and PQ to be made available to all (I will link to it as soon as it's available). Yet I still fight the battle to own and, if not ever be a native to theory, then at least be semi-fluent.

Late in the dissertation I identified this book: Theories and Models of Communication (2013). In Cobley P., Schulz P. J. (Eds.). Berlin: De Gruyter. I browsed it a bit on Google Books and then requested it from another library. I'm just getting the chance to look at it more carefully now. A lot is not new, but it is well-organized here.

Chapter 2:

Eadie, W. F., & Goret, R. (2013). Theories and Models of Communication:  Foundations and Heritage. In P. Cobley, & P. J. Schulz (Eds.), Theories and Models of Communication (pp. 17-36). Berlin: De Gruyter.

Communication as a distinct discipline emerged after WWII. Theories and researchers came from psychology, sociology, philosophy, political science... I guess probably engineering and physics, too. Then again, physicists turn up everywhere 🙂

This chapter described 5 broad categories of approaches to communication

  1. communication as shaper of public opinion - this came from WWII propaganda work. Main dudes: Park, Lippman, Lazarsfeld, Lasswell
  2. communication as language use - this is like semiotics. Main dudes: Sassure, Pierce
  3. communication as information transmission - this would be where you find the linear models like Shannon & Weaver as well as updates like Schramm and Berlo. From those came Social Learning/Social Cognitive Theory (Bandura), Uses and Gratifications, Uncertainty Reduction Theory (Berger and Calabrese), and eventually Weick, who we all know from the sensemaking stuff.
  4. communication as developer of relationships - Bateson, Watzlawick "interactional view", Expectancy Violations Theory (Burgoon), Relational Dialectics Theory (Baxter)
  5. communication as definer, interpreter, and critic of culture - this is where you get the critical theory (like critical race theory, etc.). Frankfurt School (Marcuse, Adorno, Horkheimer, Benjamin), Structuralism, Gramsci, Habermas

Chapter 3:

Craig, R. T. (2013). Constructing Theories in Communication Research. In P. Cobley, & P. J. Schulz (Eds.), Theories and Models of Communication (pp. 39-57). Berlin: De Gruyter.

"A scientific theory is a logically connected set of abstract statements from which empirically testable hypotheses and explanations can be derived." (p.39)

"Metatheory articulates and critiques assumptions underlying particular theories or kinds of theories" (p. 40)

He uses words in a different way than I think I learned. Like metatheory - his is like meta about theories, but I think other people may use it like overarching big mama theory with baby theories?

Anyhoo. He says there are these metatheoretical assumptions useful to understand the landscape of communications theories.

  1. about objects that are theorized (ontology)
  2. basis for claims of truth or validity (epistemology)
  3. normative practices for generating, presenting, using theories (praxeology)
  4. values that determine worth of a theory (axiology)

Ontology - what is communication? Basically transmission models or constitutive models.  "symbolic process whereby reality is produced, maintained, repaired, transformed" (Carey, 2009)

His constitutive metamodel of communication theories (these were described better in chapter 2, but reiterated by the author himself in 3)

  1. rhetorical - communication is a practical art
  2. semiotic - intersubjective mediation via signs
  3. phenomenological - experiencing otherness through authentic dialog (or perhaps BS - no it doesn't say that 🙂 )
  4. cybernetic - communications = information processing
  5. sociopsychological - communications = expression, interaction, influence
  6. sociocultural - communications = means to (re)produce social order
  7. critical - discursive reflection on hegemonic ideological forces and their critiques

Theory means something different in physics than it does in sociology. This is due to the objects of study and how and what we can know about them as well as by what values we judge the theory. Two main approaches to constructing theory in comms are: empirical-scientific and critical-interpretive.

Functions of a scientific theory: description, prediction, explanation, and control.

Two kinds of explanation: causal and functional. Communication explanatory principles: hedonistic (pleasure seeking), understanding-driving, consistency-driven, goal-driven, process-driven, or functional (cites Pavitt, 2010).

Criteria to judge quality: empirical support, scope, precision, aesthetic (elegance), heuristic value.

Theory != model|paradigm . Model is a representation, theory provides explanation.  Paradigm is a standard research framework used in a particular field.

Epidemiological assumptions.

  • Realist - underlying causal mechanisms can be known
  • Instrumentalist - scientific concepts correspond to real things and can be useful in making predictions
  • Constructivist - phenomena can't be known independently of our theories  - paradigm determines how empirical data will be interpreted.

A classical issue is level of analysis - do you go biological or psychological or do you go more sociological? Small groups? Societies?

Also do you build the whole theory at once or add to it as you go along to build it up?

Critical-Interpretive - these are like from humanities like rhetoric, textual criticism, etc. "Purpose has been ideographic (understanding historical particulars) rather than nomothetic (discovering universal laws)" p. 49

Interpretive. Methods (praxeology) - conversation analysis, ethnography, rhetorical criticism. Emphasize heuristic functions of theory. Not generalizable causal explanations, but conceptual frames to assist in interpreting new data. It's accepted to use multiple theories to better understand "diverse dimensions of an object" instead of insisting on one right path. Carbaugh and Hastings 1992 4 phases of theory construction

  1. developing a basic orientation to communication
  2. conceptualizing specific kinds of communicative activity
  3. formulating the general way in which communication is patterned within a socioculturally situated community
  4. evaluating the general theory from the vantage point of the situated case (p.51)

Critical. purpose of critical theory is social change.

Anyway, more to follow as I hopefully continue on in the book.

Comments are off for this post

Notes from International Symposium on Science of Science 2016 (#ICSS2016) - Day 2

This day's notes were taken on my laptop - I remembered to bring a power strip! But, I was also pretty tired, so it's a toss up.

 

Luis Amaral, Northwestern

What do we know now?

Stringer et al JASIST 2010 distribution of number of citations

25% of papers overall in WoS (1955-2006) haven’t been cited at all yet for particular journals (ex. Circulation) there may be no papers that haven’t been cited.

Stringer et all PLoS ONE 2008 – set of papers from a single journal

Discrete log normal distribution – articles published in a journal in a year

Works well for all but large, multidisciplinary journals – Science, Nature, PNAS, but also PRL and JACS

For most journals takes 5-15 years to reach asymptotic state

Moreira et al PLOS ONE 2015 – set of papers from a department. Also discrete log normal.

Also did work on significant movies - citations using IMDB connections section (crowd sourced annotation of remakes, reuse of techniques like framing, references/tributes, etc.)

Brian Uzzi, Northwestern

Age of Information and the Fitness of Scientific Ideas and Inventions

How do we forage for information – given a paper is published every 20 minutes – such that we find information related to tomorrow’s discoveries?

He’s going to show WoS, patents, law and how pattern works.

Foraging with respect to time (Evans 2008, Jones & Weinberg 201?)

Empirical strategies of information foraging some papers reference a tightly packed by year, some high mean age, high age variance…

Average age of information (mean of PY - PY of cited articles)

Low mean age, high age variance is most likely to be tomorrow’s hits (top 5% cited in a field)

Tried this same method in patent office- inventors don’t pick all the citations. Examiner assigns citations. Patents have the same hotspot.

 

Audience q: immediacy index, other previous work similar…

A: they mostly indicate you want the bleeding edge. Turns out not really you need to tie it to the past.

Cesar Hidalgo, MIT

Science in its Social Context

Randal Collins “the production of socially decontextualized knowledge” “knowledge whose veracity doesn’t depend on who produced it”

But science is produced in a social context

He is not necessarily interested in science for science's sake but rather, how people can do things together better than they can do individually.

What teams make work that is more cited

Several articles show that larger teams produce work that is more cited, but these papers were disputed. Primary criticism: other explanatory factors like larger things are more cited, more connected teams, self-promotion/self-citation with more authors, also cumulative advantage – after get one paper in high impact journal easier to get more in there

Various characteristics – number authors, field, JIF, diversity (fields, institution, geographic, age),

Author disambiguation (used Google Scholar – via scraping)

Connectivity – number of previous co-authorship relationships

Collaboration negative fields vs. collaboration positive fields

On average more connected the team more cited the paper on average. Interaction between JIF and connectivity. Weak but consistent evidence that larger and more connected teams get cited more. Effects of team composition negligible compared to area of publication and JIF.

 

How do people change the work they do?

Using Scholar 99.8%, 97.6% authors publish in four or more fields… typically closely related fields

Policy makers need to assign money to research fields – what fields are you likely to succeed in?

Typically use citations but can’t always author in fields you can cite (think statistics)

Use career path? Fields that cite each other are not fields authors traverse in their career path.

Q: is data set from Google Scholar sharable?

A: He’s going to ask them and when his paper is out and then will

Guevara et al (under review ) arxiv.org/abs/1602.08409

Data panel

Alex Wade, Microsoft Research – motivation: knowledge graph of scholarly content. Knowledge neighborhood within larger knowledge graph usable for Bing (context, and conversations, scaling up the knowledge acquisition process), Cortana, etc. Can we use approaches from this field (in the tail) for the web scale? Microsoft Academic Graph (MAG). MS academic search is mothballed. Now on Bing platform building this graph – institutions, publications, citations, events, venues, fields of study. >100M publications. Now at academic.microsoft.com  - can see graph, institution box. Pushed back into Bing – link to knowledge box, links to venues, MOOCs, etc. Conversational search… Cortana will suggest papers for you, suggest events. Aka.ms/academicgraph

[aside: has always done better at computer science than any other subject. Remains to be seen if they can really extend it to other fields. Tried a couple of geoscientists with ok results.]

James Pringle, Thomson Reuters – more recent work using the entire corpus. Is the Web of Science up to it? 60 M records core collection. Partnered with regional citation databases (Chinese, SciELO, etc). "One person’s data is another person’s metadata." Article metadata for its own use. Also working with figshare and others. Building massive knowledge graph. As a company interested in mesolevel. Cycle of innovation. Datamining, tagging, visualization… drug discovery…connection to altmetrics… How do we put data in the hands of who needs it. What model to use? Which business model?

Mark Hahnel, Figshare

Figshare for institutions – non-traditional research outputs, data, video … How can we *not* mess this up? Everything you upload can be tracked with a DOI. Linked to GitHub. Tracked by Thomson Reuters data cite database. Work with institutions to help them hold data. Funder mandates for keeping data but where’s the best place?

Funders require data sharing but don’t provide infrastructure.

Findable, interoperable, usable, need an api … want to be able to ask on the web: give me all the information on x in csv and get it. Can’t ask the question if data aren’t available.

Need persistent identifiers. Share beta search.

Daniel Calto, Research Intelligence, Elsevier

Data to share – big publisher, Scopus, also Patent data and patent history,

Sample work: comparing cities, looking at brain circulation (vs. brain drain) – Britain has a higher proportion of publications by researchers only there for 2 years  - much higher than Japan, for example

Mash their data with open public information.

Example: mapping gender in Germany. Women were more productive in physics and astronomy than men. Elsevier Research Intelligence web page full global report coming

Panel question: about other data besides journal citations

Hahnel: all sorts of things including altmetrics

Pringle: usage data  - human interactions, click stream data, to see what’s going on in an anonymous way. What’s being downloaded to a reference manager; also acknowledgements

Calto: usage data also important. Downloading an abstract vs. downloading a full text – interpreting still difficult. How are academic papers cited in patents.

Afternoon:

Reza Ghanadan, DARPA

Simplifying Complexity in Scientific Discovery (aka Simplex)

DSO is in DARPA, like DARPA’s DARPA

Datafication > knowledge representation > discovery tools

Examples: neuroscience, novel materials, anthropology, precision genomics, autonomy

Knowledge representation

Riq Parra – Air Force Office of Science Research

(like Army RO and ONR) their budget is ~60M all basic research (6-1)

All Air Force 6-1 money goes to AFOSR

40 portfolios – 40 program officers (he’s 1 of 40). They don't rotate like NSF. They are career.

Air Space, Outer Space, Cyber Space.

Some autonomy within agency. Not panel based. Can set direction, get two external reviews (they pick reviewers), talk a lot with the community

Telecons > white papers > submissions > review > funding

How to talk about impact of funding? Mostly anecdotal – narratives like transitions. Over their 65 years they’ve funded 78 Nobel Prize Winners on average 17 years prior to selection

Why he’s here – they do not use these methods to show their impact.  He would like to in spirit of transparency show why they fund what they fund, what impact it has, how does it help the Air Force and its missions.

Ryan Zelnio, ONR

horizon scan to see where onr global should look, where spend attention and money, assess portfolio

global technology awareness quarterly meetings

20-30 years out forecasting

Bibliometrics is one of a number of things they look at. Have qualitative aspects, too.

Need more in detecting emerging technologies

Dewey Murdick, DHS S&T

All the R&D (or most) for the former 22 agencies. More nearer term than an ARPA. Ready within months to a couple years. R&D budget 450M … but divide it over all the mission areas and buy everyone a Snickers.

Decision Support Analytics Mission – for big/important/impactful decisions. Analytics of R&D portfolio.

Establishing robust technical horizon scanning capability. Prototype anticipatory analytics capability.

Brian Pate, DTRA

Awareness and forecasting for C-WMD Technologies

Combat support agency – 24x7 reachback capability. Liaison offices at all US Commands.

6.1-6.3 R&D investments.

Examples: ebola response, destruction of chem weps in Syria, response to Fukushima.

Low probability event with high consequences. No human studies. Work with DoD agencies, DHS, NIH, others.

Move from sensing happening with state actors to anticipatory, predicting, non-state actors.

Deterrence/treaty verification, force protection, global situational awareness, counter wmd

BSVE – biosurveillance architecture, cloud based social self-sustaining, pre-loaded apps

Transitioned to JPEO-CWD – wearable CB exposure monitor

FY17 starting DTRA tech forecasting

Recent DTRA RFI – on identifying emerging technologies.

Audience q: Do you have any money for me?

Panel a: we will use your stuff once someone else pays for it

Ignite talks - random notes

Forecite.us

Torvik:

Abel.lis.illinois.edu

Ethnea - instance based ethnicity, Genni (JCDL 2013), Author-ity (names disambiguated)

Predict ethnicity gender age

MapAffil - affiliation geocoder

Ethnicity specific gender over time using 10M+ pubmed papers

 

Larramore: Modeling faculty hiring networks

 

Bruce Weinberg, Ohio State

Toward a Valuation of Research

IRIS (Michigan) – people based approach to valuing research. People are the vectors by which ideas are transmitted, not disembodied publications

- CIC/AAU/Census

Innovation in an aging society – aging biomedical research workforce

Data architecture

  • bibliometric
  • dissertations
  • web searches
  • patents
  • funding
  • star metrics (other people in labs), equipment, vendors
  • tax records
  • business census

Metrics for transformative work

  •  text analytics
  • citation patterns from WoS

Impact distinct from transformative. Mid-career researchers moving more into transformative work.

Some findings not captured in my notes: how women PhD graduates are doing (same positions, paid slightly more, held back by family otherwise). PhD graduates in industry staying in the same state, making decent money (some non-negligible proportion in companies with median salaries >200k ... median.)

John Ioannidis, Stanford

Defining Meta-research: an evolving discipline

- how to perform communicate verify evaluate reward science

- paper in PLOS Biology, JAMA

 

 

Comments are off for this post

Notes from International Symposium on Science of Science 2016 (#ICSS2016) - Day 1

This conference was held at the Library of Congress March 22 and 23, 2016. The conference program is at: http://icss.ist.psu.edu/program.html

I had the hardest time remembering the hashtag so you may want to search for ones with more C or fewer or more S.

This conference was only one track but it was jam-packed and the days were pretty long. On the first day, my notes were by hand and my tweets were by phone (which was having issues). The second day I brought a power strip along and then took notes and tweeted by laptop.

One thing I want to do here is to gather the links to the demo tools and data sets that were mentioned with some short commentary where appropriate. I do wish I could have gotten myself together enough to submit something, but what with the dissertation and all. (and then I'm only a year late on a draft of a paper and then I need to write up a few articles from the dissertation and and and and...)
Maryann Feldman SciSIP Program Director

As you would expect, she talked about funding in general and the program. There are dear colleague letters. She really wants to hear from researchers in writing - send her a one-pager to start a conversation. She funded the meeting.

Katy Börner Indiana University

She talked about her Mapping Exhibit - they're working on the next iteration and are also looking for venues for the current. She is interested in information analysis/visualization literacy (hence her MOOC and all her efforts with SCI2 and all). One thing she's trying now is a weather report format. She showed an example.

She did something with the descriptive models of the global scientific food web. Where are sources and where are sinks of citations?

Something more controversial was her idea of collective allocation of funding. Give each qualified PI a pot of money that they *must* allocate to other projects. So instead of a small body of reviewers, everyone in the field would be a reviewer. If the top PI got more than a certain amount. They would have to re-allocate to other projects.

I'm not sure I got this quote exactly but it was something like:

Upcoming conference at National Academy of Science on Modeling Sci Tech Innovations May 16-18.

They have a data enclave at Indiana with research data they and their affiliates can use. (I guess LaRiviere also has and has inherited a big pile o'data? This has been a thought of mine... getting data in format so I could have it lying around if I wanted to play with it).
Filippo Radicchi Indiana University

He spoke about sleeping beauties in science. These are the articles that receive few citations for many years and then are re-discovered and start anew. This is based on this article. Turns out the phenomenon occurs fairly regularly and across disciplines. In some cases it's a model that then is more useful when computing catches up. In other cases it's when something gets picked up by a different discipline. One case is something used to make graphene. He's skeptical one of the top articles in this category is actually being read by people who cite it because it's only available in print in German from just a few libraries! (However, a librarian in the session *had* gotten a copy for a staff member who could read German).

I would love to take his 22M article data set and try the k-means longitudinal. If sleeping beauty is found often, what are the other typical shapes beyond the standard one?

He also touched on his work with movies - apparently using an oft-overlooked section of IMDB that provides information on references (uses same framing as x, adopt cinematography style of y, remakes z... I don't know, but relationships).

Carl Bergstrom University of Washington

The first part of his talk reviewed Eigenfactor work which should be very familiar to this audience (well except a speaker on the second day had no idea it was a new-ish measure that had since been adopted by JCR - he should update his screenshot - anyhoo)

Then he went on to discuss a number of new projects they're working on. Slides are here.

Where ranking journals has a certain level of controversy, they did continue on to rank authors (ew?), and most recently articles which required some special steps.

Cooler, I think was the next work discussed.  A mapping technique for reducing a busy graph to find patterns. "Good maps simplify and highlight relevant structures." Their method did well when compared to other method and made it possible to compare over years. Nice graphic showing the emergence of neuroscience. They then did a hierarchical version. Also pretty cool. I'd have to see this in more detail, but looks like a better option than the pruning and path methods I've seen to do similar things. So this hierarchical map thing is now being used as a recommendation engine.  See babe'.eigenfactor.org . I'll have to test it out to see.

Then (it was a very full talk) women vs. men. Men self-cite more. Means they have higher h-index.
Jacob Foster UCLA (Sociology)

If the last talk seemed packed. This was like whoa. He talked really, really fast and did not slow down. The content was pretty heavy duty, too. It could be that the remainder of the room basically knew it all so it was all review. I have read all the standard STS stuff, but it was fast.

He defines science as "the social production of collective intelligence."

Rumsfeld unknown unknowns... he's more interested in unknown knowns. (what do you know but do not know you know... you know? 🙂 )

Ecological rationality - rationality of choices depends on context vs rational choice theory which is just based on rules, not context.

Think of scientists as ants. Complex sociotechnical system. Information processing problem, using Marr's Levels.

  • computational level: what does the system do (e.g.: what problems does it solve or overcome) and similarly, why does it do these things
  • algorithmic/representational level: how does the system do what it does, specifically, what representations does it use and what processes does it employ to build and manipulate the representations
  • implementational/physical level: how is the system physically realised (in the case of biological vision, what neural structures and neuronal activities implement the visual system)

https://en.wikipedia.org/wiki/David_Marr_(neuroscientist)#Levels_of_analysis

Apparently less studied in humans is the representational to hardware. ... ? (I have really, really bad handwriting.)

science leverages and tunes basic information processing (?).. cluster attention.

(incidentally totally weird Google Scholar doesn't know about "american sociological review" ? or ASR? ended up browsing)
Foster,J.G., Rzhetsky,A., Evans, J.A. (2015) Tradition and Innovation in Scientists’ Research Strategies. ASR 80, 875-908. doi: 10.1177/0003122415601618

Scientists try various strategies to optimize between tradition (more likely to be accepted) and innovation (bigger pay offs). More innovative papers get more citations but conservative efforts are rewarded with more cumulative citations.

Rzhetsky,A.,Foster,I.T., Foster,J.G.,  Evans, J.A (2015) Choosing experiments to accelerate collective discovery. PNAS 112, 14569–14574. doi: 10.1073/pnas.1509757112

This article looked at chemicals in pubmed. Innovative was new ones. Traditional was in the neighborhood of old ones. They found that scientists spend a lot of time in the neighborhood of established important ones where they could advance science better by looking elsewhere. (hmmm, but... hm.)

The next bit of work I didn't get a citation for - not even enough to search - but they looked at JSTOR and word overlap. Probabilistic distribution of terms. Joint probability. (maybe this article? pdf). It looked at linguistic similarity (maybe?) and then export/import of citations. So ecology kept to itself while social sciences were integrated. I asked about how different social sciences fields use the same word with vastly different meanings - mentioned Fleck. He responded that it was true but often there is productive ambiguity of new field misusing or misinterpreting another field's concept (e.g., capital). I'm probably less convinced about this one, but would need to read further.

Panel 1: Scientific Establishment

  • George Santangelo - NIH portfolio management. Meh.
  • Maryann Feldman - geography and Research Triangle Park
  • Iulia Georgescu, Veronique Kiermer,Valda Vinson - publishers who, apparently, want what might already be available? Who are unwilling (except PLOS) or unable to quid pro quo share data/information in return for things. Who are skeptical (except for PLOS) that anything could be done differently? that's my take. Maybe others in the room found it more useful.

Nitesh Chawla University of Notre Dame

(scanty notes here - not feedback on the talk)

Worked with Arnet Miner data to predict h-indices.

Paper: http://arxiv.org/abs/1412.4754 

It turns out, that according to them, venue is key. So all of the articles that found poor correlation between JIF and an individual paper's likelihood of being cited.. they say actually a pretty good predictor when combined with researcher's authority. Yuck!

Janet Vertesi Princeton University

Perked up when I realized who she is - she's the one who studied the Rover teams! Her book is Seeing Like a Rover.  Her dissertation is also available online, but everyone should probably go buy the book.  She looked at more a meso level of knowledge, really interested in teams. She found that different teams - even teams with overlapping membership - managed knowledge differently. The way instrument time (or really spacecraft maneuvering so you can use your instrument time) was handled was very different. A lot had to do with the move in the '90s for faster...better... cheaper (example MESSENGER). She used co-authoring networks in ADS and did community detection. Co-authorship shows team membership as same casts of characters writing. This field is very different from others as publications are in mind while the instruments are being designed.

She compared Discovery class missions - Mars Exploration Rover - collectivist, integrated; everyone must give a go ahead for decisions; Messenger - design system working groups (oh my handwriting!)

vs. Flagship - Cassini - hierarchical, separated. Divided up sections of spacecraft. Conflict and competition. Used WWII as a metaphor (?!!). No sharing even among subteams before release.  Clusters are related to team/instrument.

New PI working to merge across - this did show in evolution of network to a certain extent.

Galileo is another flagship example. breaks down into separate clusters. not coordinated.

Organization of teams matters.

I admitted my fan girl situation and asked about the engineers. She only worked with scientists because she's a foreign national (may not mean anything to my readers who aren't in this world but others will be nodding their heads).  She is on a team for an upcoming mission so will see more then. She also has a doctoral student who is a citizen who may branch off and study some of these things.
Ying Ding Indiana University

She really ran out of time in the end. I was interested in her presentation but she flew past the meaty parts.

Ignite Talks (15s per slide 2min overall or similar)

  • Filippo Menczer - http://scholarometer.indiana.edu/ - tool to view more information about authors and their networks. Browser extension.
  • Caleb Smith,
  • Orion Penner - many of us were absolutely transfixed that he dropped his note pages on the floor as he finished. It was late in the day!  He has a few articles on predicting future impact (example). On the floor.
  • Charles Ayoubi,
  • Michael Rose,
  • Jevin West,
  • Jeff Alstott - awesome timing, left 15 for a question and 15 for its answer. Audience didn't play along.

Lee Giles Penn State University

It was good to save his talk for last. A lot going on besides keeping CiteSeer up and running. They do make their data and their algorithms freely available (see: http://csxstatic.ist.psu.edu/about ) . This includes extracting references. They also are happy to add in new algorithms that make improvements and work in their system. They accept any kind of document that works in their parsers so typically journal articles and conference papers.

RefSeer - recommends cites you should add

TableSeer - extracts tables (didn't mention and there wasn't time to ask... he talked a lot about this for chemistry... I hope he's working with the British team doing the same?)

Also has things to extract formulas, plots, and equations. Acknowledgements. Recommend collaborators (0 for me, sniff.) See his site for links.

 

 

2 responses so far

Preliminary thoughts on longitudinal k-means for bibliometric trajectories

I read with great interest Baumgartner and Leydesdorff's article* on group based trajectory modeling of bibliometric trajectories and I immediately wanted to try it. She used SAS or something like that, though, and I wanted R. I fooled around with this last year for a while and I couldn't get it going in the R package for GBTM**

Later, I ran across a way to do k-means clustering for longitudinal data - for trajectories! Cool. I actually understand the math a lot better, too.

Maybe I should mention what I mean about trajectories in this case. When you look at citations per year for articles in science, there's a typical shape .. a peak at year 2-3 (depends on field), and then slacks off and is pretty flat. Turns out there are a few other typical shapes you see regularly. One is the sleeping beauty - it goes along and then gets rediscovered and all the sudden has another peak - maybe it turns out to be useful for computational modeling once computers catch up. Another is the workhorse paper that just continues to be useful overtime and takes a steady strain - maybe it's a really nice review of a phenomenon. There may be 5 different shapes?  I don't think anyone knows yet, for sure.

So instead of my other dataset I was playing with last year with like 1000 articles from MPOW, I'm playing with articles from MPOW that were published between 1948 and 1979 and that were identified in a 1986 article as citation classics. 22 articles. I downloaded the full records for their citing articles and then ran an R script to pull of the PY of the citing articles (I also pulled of cited articles and did a fractional Times Cited count but that's another story). I cut off the year the article was published, and then kept the next 35 years for each of the articles. It's like up to 2015 for a couple but I don't think that will matter a lot as we're a ways into 2016 now.

Loaded it into R, plotted the trajectories straight off:

trajLooks like a mess and there are only 22!

Let's look at 3 clusters:

3clustersOk, so look at the percentiles. 4% is one article. This is a very, very famous article. You can probably guess it if you know MPOW. Then the green cluster is probably the work horses. The majority are the standard layout.

Let's look at 4 clusters:

4clustersYou still here have the one crazy one. Like 5 workhorses. The rest are variations on the normal spike. Some a really sharp spike and then not much after (these were the latest ones in the set - the author didn't have enough distance to see what they would do). Others a normal spike then pretty flat.

So I let it do the default and calculate with 2, 3, 4, 5, 6 clusters. When you get above 4, you just add more singletons. The article on kml*** says there's no absolute way to identify the best number of clusters but they give you a bunch of measurements and if they all agree, Bob's your uncle.

qualityBigger is better (they normalize and flip some of them so you can look at them like this). Well, nuts. So the methods that look at compactness of the clusters divided by how far apart they're spaced (the first 3, I think?) are totally different than 4 - which is just like distance from centroids or something like that. I don't know.I probably have to look at that section again.

Looking at the data, it doesn't make sense at all to do 5 or 6. Does 4 add information over 3? I think so, really. Of course with this package you can do different distance measurements and different starting points, and different numbers of iterations.

What practical purpose does this solve? Dunno? I really think it's worth giving workhorse papers credit. A good paper that continues to be useful... makes a real contribution, in my mind. But is there any way to determine that vs. a mediocre paper with a lower spike short of waiting 35 years? Dunno.

 

*Baumgartner, S. E., & Leydesdorff, L. (2014). Group‐based trajectory modeling (GBTM) of citations in scholarly literature: dynamic qualities of “transient” and “sticky knowledge claims”. Journal of the Association for Information Science and Technology, 65(4), 797-811. doi: 10.1002/asi.23009 (see arxiv)

** Interesting articles on it. It's from criminology and looks at recidivism. Package.

*** Genolini, C., Alacoque, X., Sentenac, M., & Arnaud, C. (2015). kml and kml3d: R Packages to Cluster Longitudinal Data. Journal of Statistical Software, 65(4), 1-34. Retrieved from http://www.jstatsoft.org/v65/i04/

Comments are off for this post

Older posts »