This day's notes were taken on my laptop - I remembered to bring a power strip! But, I was also pretty tired, so it's a toss up.
Luis Amaral, Northwestern
What do we know now?
Stringer et al JASIST 2010 distribution of number of citations
25% of papers overall in WoS (1955-2006) haven’t been cited at all yet for particular journals (ex. Circulation) there may be no papers that haven’t been cited.
Stringer et all PLoS ONE 2008 – set of papers from a single journal
Discrete log normal distribution – articles published in a journal in a year
Works well for all but large, multidisciplinary journals – Science, Nature, PNAS, but also PRL and JACS
For most journals takes 5-15 years to reach asymptotic state
Moreira et al PLOS ONE 2015 – set of papers from a department. Also discrete log normal.
Also did work on significant movies - citations using IMDB connections section (crowd sourced annotation of remakes, reuse of techniques like framing, references/tributes, etc.)
Brian Uzzi, Northwestern
Age of Information and the Fitness of Scientific Ideas and Inventions
How do we forage for information – given a paper is published every 20 minutes – such that we find information related to tomorrow’s discoveries?
He’s going to show WoS, patents, law and how pattern works.
Foraging with respect to time (Evans 2008, Jones & Weinberg 201?)
Empirical strategies of information foraging some papers reference a tightly packed by year, some high mean age, high age variance…
Average age of information (mean of PY - PY of cited articles)
Low mean age, high age variance is most likely to be tomorrow’s hits (top 5% cited in a field)
Tried this same method in patent office- inventors don’t pick all the citations. Examiner assigns citations. Patents have the same hotspot.
Audience q: immediacy index, other previous work similar…
A: they mostly indicate you want the bleeding edge. Turns out not really you need to tie it to the past.
Cesar Hidalgo, MIT
Science in its Social Context
Randal Collins “the production of socially decontextualized knowledge” “knowledge whose veracity doesn’t depend on who produced it”
But science is produced in a social context
He is not necessarily interested in science for science's sake but rather, how people can do things together better than they can do individually.
What teams make work that is more cited
Several articles show that larger teams produce work that is more cited, but these papers were disputed. Primary criticism: other explanatory factors like larger things are more cited, more connected teams, self-promotion/self-citation with more authors, also cumulative advantage – after get one paper in high impact journal easier to get more in there
Various characteristics – number authors, field, JIF, diversity (fields, institution, geographic, age),
Author disambiguation (used Google Scholar – via scraping)
Connectivity – number of previous co-authorship relationships
Collaboration negative fields vs. collaboration positive fields
On average more connected the team more cited the paper on average. Interaction between JIF and connectivity. Weak but consistent evidence that larger and more connected teams get cited more. Effects of team composition negligible compared to area of publication and JIF.
How do people change the work they do?
Using Scholar 99.8%, 97.6% authors publish in four or more fields… typically closely related fields
Policy makers need to assign money to research fields – what fields are you likely to succeed in?
Typically use citations but can’t always author in fields you can cite (think statistics)
Use career path? Fields that cite each other are not fields authors traverse in their career path.
Q: is data set from Google Scholar sharable?
A: He’s going to ask them and when his paper is out and then will
Guevara et al (under review ) arxiv.org/abs/1602.08409
Alex Wade, Microsoft Research – motivation: knowledge graph of scholarly content. Knowledge neighborhood within larger knowledge graph usable for Bing (context, and conversations, scaling up the knowledge acquisition process), Cortana, etc. Can we use approaches from this field (in the tail) for the web scale? Microsoft Academic Graph (MAG). MS academic search is mothballed. Now on Bing platform building this graph – institutions, publications, citations, events, venues, fields of study. >100M publications. Now at academic.microsoft.com - can see graph, institution box. Pushed back into Bing – link to knowledge box, links to venues, MOOCs, etc. Conversational search… Cortana will suggest papers for you, suggest events. Aka.ms/academicgraph
[aside: has always done better at computer science than any other subject. Remains to be seen if they can really extend it to other fields. Tried a couple of geoscientists with ok results.]
James Pringle, Thomson Reuters – more recent work using the entire corpus. Is the Web of Science up to it? 60 M records core collection. Partnered with regional citation databases (Chinese, SciELO, etc). "One person’s data is another person’s metadata." Article metadata for its own use. Also working with figshare and others. Building massive knowledge graph. As a company interested in mesolevel. Cycle of innovation. Datamining, tagging, visualization… drug discovery…connection to altmetrics… How do we put data in the hands of who needs it. What model to use? Which business model?
Mark Hahnel, Figshare
Figshare for institutions – non-traditional research outputs, data, video … How can we *not* mess this up? Everything you upload can be tracked with a DOI. Linked to GitHub. Tracked by Thomson Reuters data cite database. Work with institutions to help them hold data. Funder mandates for keeping data but where’s the best place?
Funders require data sharing but don’t provide infrastructure.
Findable, interoperable, usable, need an api … want to be able to ask on the web: give me all the information on x in csv and get it. Can’t ask the question if data aren’t available.
Need persistent identifiers. Share beta search.
Daniel Calto, Research Intelligence, Elsevier
Data to share – big publisher, Scopus, also Patent data and patent history,
Sample work: comparing cities, looking at brain circulation (vs. brain drain) – Britain has a higher proportion of publications by researchers only there for 2 years - much higher than Japan, for example
Mash their data with open public information.
Example: mapping gender in Germany. Women were more productive in physics and astronomy than men. Elsevier Research Intelligence web page full global report coming
Panel question: about other data besides journal citations
Hahnel: all sorts of things including altmetrics
Pringle: usage data - human interactions, click stream data, to see what’s going on in an anonymous way. What’s being downloaded to a reference manager; also acknowledgements
Calto: usage data also important. Downloading an abstract vs. downloading a full text – interpreting still difficult. How are academic papers cited in patents.
Reza Ghanadan, DARPA
Simplifying Complexity in Scientific Discovery (aka Simplex)
DSO is in DARPA, like DARPA’s DARPA
Datafication > knowledge representation > discovery tools
Examples: neuroscience, novel materials, anthropology, precision genomics, autonomy
Riq Parra – Air Force Office of Science Research
(like Army RO and ONR) their budget is ~60M all basic research (6-1)
All Air Force 6-1 money goes to AFOSR
40 portfolios – 40 program officers (he’s 1 of 40). They don't rotate like NSF. They are career.
Air Space, Outer Space, Cyber Space.
Some autonomy within agency. Not panel based. Can set direction, get two external reviews (they pick reviewers), talk a lot with the community
Telecons > white papers > submissions > review > funding
How to talk about impact of funding? Mostly anecdotal – narratives like transitions. Over their 65 years they’ve funded 78 Nobel Prize Winners on average 17 years prior to selection
Why he’s here – they do not use these methods to show their impact. He would like to in spirit of transparency show why they fund what they fund, what impact it has, how does it help the Air Force and its missions.
Ryan Zelnio, ONR
horizon scan to see where onr global should look, where spend attention and money, assess portfolio
global technology awareness quarterly meetings
20-30 years out forecasting
Bibliometrics is one of a number of things they look at. Have qualitative aspects, too.
Need more in detecting emerging technologies
Dewey Murdick, DHS S&T
All the R&D (or most) for the former 22 agencies. More nearer term than an ARPA. Ready within months to a couple years. R&D budget 450M … but divide it over all the mission areas and buy everyone a Snickers.
Decision Support Analytics Mission – for big/important/impactful decisions. Analytics of R&D portfolio.
Establishing robust technical horizon scanning capability. Prototype anticipatory analytics capability.
Brian Pate, DTRA
Awareness and forecasting for C-WMD Technologies
Combat support agency – 24x7 reachback capability. Liaison offices at all US Commands.
6.1-6.3 R&D investments.
Examples: ebola response, destruction of chem weps in Syria, response to Fukushima.
Low probability event with high consequences. No human studies. Work with DoD agencies, DHS, NIH, others.
Move from sensing happening with state actors to anticipatory, predicting, non-state actors.
Deterrence/treaty verification, force protection, global situational awareness, counter wmd
BSVE – biosurveillance architecture, cloud based social self-sustaining, pre-loaded apps
Transitioned to JPEO-CWD – wearable CB exposure monitor
FY17 starting DTRA tech forecasting
Recent DTRA RFI – on identifying emerging technologies.
Audience q: Do you have any money for me?
Panel a: we will use your stuff once someone else pays for it
Ignite talks - random notes
Ethnea - instance based ethnicity, Genni (JCDL 2013), Author-ity (names disambiguated)
Predict ethnicity gender age
MapAffil - affiliation geocoder
Ethnicity specific gender over time using 10M+ pubmed papers
Larramore: Modeling faculty hiring networks
Bruce Weinberg, Ohio State
Toward a Valuation of Research
IRIS (Michigan) – people based approach to valuing research. People are the vectors by which ideas are transmitted, not disembodied publications
Innovation in an aging society – aging biomedical research workforce
- web searches
- star metrics (other people in labs), equipment, vendors
- tax records
- business census
Metrics for transformative work
- text analytics
- citation patterns from WoS
Impact distinct from transformative. Mid-career researchers moving more into transformative work.
Some findings not captured in my notes: how women PhD graduates are doing (same positions, paid slightly more, held back by family otherwise). PhD graduates in industry staying in the same state, making decent money (some non-negligible proportion in companies with median salaries >200k ... median.)
John Ioannidis, Stanford
Defining Meta-research: an evolving discipline
- how to perform communicate verify evaluate reward science
- paper in PLOS Biology, JAMA