Notes from International Symposium on Science of Science 2016 (#ICSS2016) - Day 2

This day's notes were taken on my laptop - I remembered to bring a power strip! But, I was also pretty tired, so it's a toss up.

 

Luis Amaral, Northwestern

What do we know now?

Stringer et al JASIST 2010 distribution of number of citations

25% of papers overall in WoS (1955-2006) haven’t been cited at all yet for particular journals (ex. Circulation) there may be no papers that haven’t been cited.

Stringer et all PLoS ONE 2008 – set of papers from a single journal

Discrete log normal distribution – articles published in a journal in a year

Works well for all but large, multidisciplinary journals – Science, Nature, PNAS, but also PRL and JACS

For most journals takes 5-15 years to reach asymptotic state

Moreira et al PLOS ONE 2015 – set of papers from a department. Also discrete log normal.

Also did work on significant movies - citations using IMDB connections section (crowd sourced annotation of remakes, reuse of techniques like framing, references/tributes, etc.)

Brian Uzzi, Northwestern

Age of Information and the Fitness of Scientific Ideas and Inventions

How do we forage for information – given a paper is published every 20 minutes – such that we find information related to tomorrow’s discoveries?

He’s going to show WoS, patents, law and how pattern works.

Foraging with respect to time (Evans 2008, Jones & Weinberg 201?)

Empirical strategies of information foraging some papers reference a tightly packed by year, some high mean age, high age variance…

Average age of information (mean of PY - PY of cited articles)

Low mean age, high age variance is most likely to be tomorrow’s hits (top 5% cited in a field)

Tried this same method in patent office- inventors don’t pick all the citations. Examiner assigns citations. Patents have the same hotspot.

 

Audience q: immediacy index, other previous work similar…

A: they mostly indicate you want the bleeding edge. Turns out not really you need to tie it to the past.

Cesar Hidalgo, MIT

Science in its Social Context

Randal Collins “the production of socially decontextualized knowledge” “knowledge whose veracity doesn’t depend on who produced it”

But science is produced in a social context

He is not necessarily interested in science for science's sake but rather, how people can do things together better than they can do individually.

What teams make work that is more cited

Several articles show that larger teams produce work that is more cited, but these papers were disputed. Primary criticism: other explanatory factors like larger things are more cited, more connected teams, self-promotion/self-citation with more authors, also cumulative advantage – after get one paper in high impact journal easier to get more in there

Various characteristics – number authors, field, JIF, diversity (fields, institution, geographic, age),

Author disambiguation (used Google Scholar – via scraping)

Connectivity – number of previous co-authorship relationships

Collaboration negative fields vs. collaboration positive fields

On average more connected the team more cited the paper on average. Interaction between JIF and connectivity. Weak but consistent evidence that larger and more connected teams get cited more. Effects of team composition negligible compared to area of publication and JIF.

 

How do people change the work they do?

Using Scholar 99.8%, 97.6% authors publish in four or more fields… typically closely related fields

Policy makers need to assign money to research fields – what fields are you likely to succeed in?

Typically use citations but can’t always author in fields you can cite (think statistics)

Use career path? Fields that cite each other are not fields authors traverse in their career path.

Q: is data set from Google Scholar sharable?

A: He’s going to ask them and when his paper is out and then will

Guevara et al (under review ) arxiv.org/abs/1602.08409

Data panel

Alex Wade, Microsoft Research – motivation: knowledge graph of scholarly content. Knowledge neighborhood within larger knowledge graph usable for Bing (context, and conversations, scaling up the knowledge acquisition process), Cortana, etc. Can we use approaches from this field (in the tail) for the web scale? Microsoft Academic Graph (MAG). MS academic search is mothballed. Now on Bing platform building this graph – institutions, publications, citations, events, venues, fields of study. >100M publications. Now at academic.microsoft.com  - can see graph, institution box. Pushed back into Bing – link to knowledge box, links to venues, MOOCs, etc. Conversational search… Cortana will suggest papers for you, suggest events. Aka.ms/academicgraph

[aside: has always done better at computer science than any other subject. Remains to be seen if they can really extend it to other fields. Tried a couple of geoscientists with ok results.]

James Pringle, Thomson Reuters – more recent work using the entire corpus. Is the Web of Science up to it? 60 M records core collection. Partnered with regional citation databases (Chinese, SciELO, etc). "One person’s data is another person’s metadata." Article metadata for its own use. Also working with figshare and others. Building massive knowledge graph. As a company interested in mesolevel. Cycle of innovation. Datamining, tagging, visualization… drug discovery…connection to altmetrics… How do we put data in the hands of who needs it. What model to use? Which business model?

Mark Hahnel, Figshare

Figshare for institutions – non-traditional research outputs, data, video … How can we *not* mess this up? Everything you upload can be tracked with a DOI. Linked to GitHub. Tracked by Thomson Reuters data cite database. Work with institutions to help them hold data. Funder mandates for keeping data but where’s the best place?

Funders require data sharing but don’t provide infrastructure.

Findable, interoperable, usable, need an api … want to be able to ask on the web: give me all the information on x in csv and get it. Can’t ask the question if data aren’t available.

Need persistent identifiers. Share beta search.

Daniel Calto, Research Intelligence, Elsevier

Data to share – big publisher, Scopus, also Patent data and patent history,

Sample work: comparing cities, looking at brain circulation (vs. brain drain) – Britain has a higher proportion of publications by researchers only there for 2 years  - much higher than Japan, for example

Mash their data with open public information.

Example: mapping gender in Germany. Women were more productive in physics and astronomy than men. Elsevier Research Intelligence web page full global report coming

Panel question: about other data besides journal citations

Hahnel: all sorts of things including altmetrics

Pringle: usage data  - human interactions, click stream data, to see what’s going on in an anonymous way. What’s being downloaded to a reference manager; also acknowledgements

Calto: usage data also important. Downloading an abstract vs. downloading a full text – interpreting still difficult. How are academic papers cited in patents.

Afternoon:

Reza Ghanadan, DARPA

Simplifying Complexity in Scientific Discovery (aka Simplex)

DSO is in DARPA, like DARPA’s DARPA

Datafication > knowledge representation > discovery tools

Examples: neuroscience, novel materials, anthropology, precision genomics, autonomy

Knowledge representation

Riq Parra – Air Force Office of Science Research

(like Army RO and ONR) their budget is ~60M all basic research (6-1)

All Air Force 6-1 money goes to AFOSR

40 portfolios – 40 program officers (he’s 1 of 40). They don't rotate like NSF. They are career.

Air Space, Outer Space, Cyber Space.

Some autonomy within agency. Not panel based. Can set direction, get two external reviews (they pick reviewers), talk a lot with the community

Telecons > white papers > submissions > review > funding

How to talk about impact of funding? Mostly anecdotal – narratives like transitions. Over their 65 years they’ve funded 78 Nobel Prize Winners on average 17 years prior to selection

Why he’s here – they do not use these methods to show their impact.  He would like to in spirit of transparency show why they fund what they fund, what impact it has, how does it help the Air Force and its missions.

Ryan Zelnio, ONR

horizon scan to see where onr global should look, where spend attention and money, assess portfolio

global technology awareness quarterly meetings

20-30 years out forecasting

Bibliometrics is one of a number of things they look at. Have qualitative aspects, too.

Need more in detecting emerging technologies

Dewey Murdick, DHS S&T

All the R&D (or most) for the former 22 agencies. More nearer term than an ARPA. Ready within months to a couple years. R&D budget 450M … but divide it over all the mission areas and buy everyone a Snickers.

Decision Support Analytics Mission – for big/important/impactful decisions. Analytics of R&D portfolio.

Establishing robust technical horizon scanning capability. Prototype anticipatory analytics capability.

Brian Pate, DTRA

Awareness and forecasting for C-WMD Technologies

Combat support agency – 24x7 reachback capability. Liaison offices at all US Commands.

6.1-6.3 R&D investments.

Examples: ebola response, destruction of chem weps in Syria, response to Fukushima.

Low probability event with high consequences. No human studies. Work with DoD agencies, DHS, NIH, others.

Move from sensing happening with state actors to anticipatory, predicting, non-state actors.

Deterrence/treaty verification, force protection, global situational awareness, counter wmd

BSVE – biosurveillance architecture, cloud based social self-sustaining, pre-loaded apps

Transitioned to JPEO-CWD – wearable CB exposure monitor

FY17 starting DTRA tech forecasting

Recent DTRA RFI – on identifying emerging technologies.

Audience q: Do you have any money for me?

Panel a: we will use your stuff once someone else pays for it

Ignite talks - random notes

Forecite.us

Torvik:

Abel.lis.illinois.edu

Ethnea - instance based ethnicity, Genni (JCDL 2013), Author-ity (names disambiguated)

Predict ethnicity gender age

MapAffil - affiliation geocoder

Ethnicity specific gender over time using 10M+ pubmed papers

 

Larramore: Modeling faculty hiring networks

 

Bruce Weinberg, Ohio State

Toward a Valuation of Research

IRIS (Michigan) – people based approach to valuing research. People are the vectors by which ideas are transmitted, not disembodied publications

- CIC/AAU/Census

Innovation in an aging society – aging biomedical research workforce

Data architecture

  • bibliometric
  • dissertations
  • web searches
  • patents
  • funding
  • star metrics (other people in labs), equipment, vendors
  • tax records
  • business census

Metrics for transformative work

  •  text analytics
  • citation patterns from WoS

Impact distinct from transformative. Mid-career researchers moving more into transformative work.

Some findings not captured in my notes: how women PhD graduates are doing (same positions, paid slightly more, held back by family otherwise). PhD graduates in industry staying in the same state, making decent money (some non-negligible proportion in companies with median salaries >200k ... median.)

John Ioannidis, Stanford

Defining Meta-research: an evolving discipline

- how to perform communicate verify evaluate reward science

- paper in PLOS Biology, JAMA

 

 

No responses yet

Leave a Reply