Predictive Effects of Novelty Measured by Temporal Embeddings on the Growth of Scientific Literature

Novel scientific knowledge is constantly produced by the scientific community. Understanding the level of novelty characterized by scientific literature is key for modeling scientific dynamics and analyzing the growth mechanisms of scientific knowledge. Metrics derived from bibliometrics and citation analysis were effectively used to characterize the novelty in scientific development. However, time is required before we can observe links between documents such as citation links or patterns derived from the links, which makes these techniques more effective for retrospective analysis than predictive analysis. In this study, we present a new approach to measuring the novelty of a research topic in a scientific community over a specific period by tracking semantic changes of the terms and characterizing the research topic in their usage context. The semantic changes are derived from the text data of scientific literature by temporal embedding learning techniques. We validated the effects of the proposed novelty metric on predicting the future growth of scientific publications and investigated the relations between novelty and growth by panel data analysis applied in a large-scale publication dataset (MEDLINE/PubMed). Key findings based on the statistical investigation indicate that the novelty metric has significant predictive effects on the growth of scientific literature and the predictive effects may last for more than ten years. We demonstrated the effectiveness and practical implications of the novelty metric in three case studies.

[1]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[2]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[3]  T. Kuhn The Structure of Scientific Revolutions. Chicago (University of Chicago Press) 1962. , 1962 .

[4]  Erjia Yan,et al.  Tracking word semantic change in biomedical literature , 2018, Int. J. Medical Informatics.

[5]  Péter Érdi,et al.  Prediction of emerging technologies based on analysis of the US patent citation network , 2012, Scientometrics.

[6]  Erjia Yan,et al.  Research dynamics: Measuring the continuity and popularity of research topics , 2014, J. Informetrics.

[7]  Carlos Castillo-Chavez,et al.  Population modeling of the emergence and development of scientific fields , 2008, Scientometrics.

[8]  Tong Li,et al.  Models with Panel Data , 2015 .

[9]  Stan Hurn Panel Data Econometrics , 2010 .

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Ichiro Sakata,et al.  Detecting emerging research fronts in regenerative medicine by citation network analysis of scientific publications , 2009, PICMET '09 - 2009 Portland International Conference on Management of Engineering & Technology.

[12]  Yue Chen,et al.  Towards an explanatory and computational theory of scientific discovery , 2009, J. Informetrics.

[13]  Bokyoung Kang,et al.  Novelty-focused patent mapping for technology opportunity analysis , 2015 .

[14]  H. Small,et al.  Identifying emerging topics in science and technology , 2014 .

[15]  Gavin J. D. Smith,et al.  Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic , 2009, Nature.

[16]  S. Pincock Nobel Prize winners Robin Warren and Barry Marshall , 2005, The Lancet.

[17]  Henry G. Small,et al.  Tracking and predicting growth areas in science , 2006, Scientometrics.

[18]  Keith Stevens,et al.  Event Detection in Blogs using Temporal Random Indexing , 2009 .

[19]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Bo-Christer Björk,et al.  A lifecycle model of the scientific communication process , 2005, Learn. Publ..

[22]  Kevin W. Boyack,et al.  Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? , 2010, J. Assoc. Inf. Sci. Technol..

[23]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[24]  Chaomei Chen,et al.  Searching for intellectual turning points: Progressive knowledge domain visualization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Eugene Garfield,et al.  THE USE OF CITATION DATA IN WRITING THE HISTORY OF SCIENCE , 1964 .

[26]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[27]  Dmitriy Fradkin,et al.  Anticipating annotations and emerging trends in biomedical literature , 2008, KDD.

[28]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[29]  Daniele Rotolo,et al.  Emerging Technology , 2001 .

[30]  Wolfgang Glänzel,et al.  Using ‘core documents’ for detecting and labelling new emerging topics , 2011, Scientometrics.

[31]  Peter ErdiKinga Prediction of emerging technologies based on analysis of the US patent citation network , 2013 .

[32]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[36]  B. Marshall,et al.  UNIDENTIFIED CURVED BACILLI IN THE STOMACH OF PATIENTS WITH GASTRITIS AND PEPTIC ULCERATION , 1984, The Lancet.

[37]  W. Choi,et al.  Ebola Hemorrhagic Fever and the Current State of Vaccine Development , 2014, Osong public health and research perspectives.

[38]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[39]  Yi-Ning Tu,et al.  Indices of novelty for emerging topic detection , 2012, Inf. Process. Manag..

[40]  Yves Croissant,et al.  Panel data econometrics in R: The plm package , 2008 .

[41]  Naoki Shibata,et al.  Comparative study on methods of detecting research fronts using different types of citation , 2009, J. Assoc. Inf. Sci. Technol..

[42]  Lynnette Brammer,et al.  Infections with oseltamivir-resistant influenza A(H1N1) virus in the United States. , 2009, JAMA.