Searching for new breakthroughs in science: How effective are computerised detection algorithms?

Abstract In this study, we design, develop, implement and test an analytical framework and measurement model to detect scientific discoveries with ‘breakthrough’ characteristics. To do so, we have developed a series of computerised search algorithms that data mine large quantities of research publications. These algorithms facilitate early-stage detection of ‘breakout’ papers that emerge as highly cited and distinctive and are considered to be potential breakthroughs. Combining computer-aided data mining with decision heuristics, enabled us to assess structural changes within citation patterns with the international scientific literature. In our case studies, we applied a citation impact time window of 24–36 months after publication of each research paper. In this paper, we report on our test results, in which five algorithms were applied to the entire Web of Science database. We analysed the citation impact patterns of all research articles from the period 1990–1994. We succeeded in detecting many papers with distinctive impact profiles (breakouts). A small subset of these breakouts is classified as ‘breakthroughs': Nobel Prize research papers; papers occurring in Nature's Top-100 Most Cited Papers Ever; papers still (highly) cited by review papers or patents; or those frequently mentioned in today's social media. We also compare the outcomes of our algorithms with the results of a ‘baseline’ detection algorithm developed by Redner in 2005, which selects the world's most highly cited ‘hot papers'. The detection rates of the algorithms vary, but overall, they present a powerful tool for tracing breakout papers in science. The wider applicability of these algorithms, across all science fields, has not yet been ascertained. Whether or not our early-stage breakout papers present a ‘breakthrough’ remains a matter of opinion, where input from subject experts is needed for verification and confirmation, but our detection approach certain helps to limit the search domain to trace and track important emerging topics in science.

[1]  M. Heinemann The Matthew Effect , 2016, Thoracic and Cardiovascular Surgeon.

[2]  Charles E. Berkoff,et al.  A very early warning system for the rapid identification and transfer of new technology , 1977, J. Am. Soc. Inf. Sci..

[3]  Marten Scheffer,et al.  Complex systems: Foreseeing tipping points , 2010, Nature.

[4]  L. Darden,et al.  Interfield Theories , 1977, Philosophy of Science.

[5]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[6]  Laurel L. Haak,et al.  Predicting highly cited papers: A Method for Early Detection of Candidate Breakthroughs , 2014 .

[7]  R. Merton On the Shoulders of Giants: A Shandean Postscript , 1966 .

[8]  Christos Faloutsos,et al.  Graphs Over Time: Densification and Shrinking Diameters , 2006 .

[9]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[10]  Translated by Tanita Casci Nobel prize , 2000, Nature Reviews Neuroscience.

[11]  W. Myers,et al.  Atypical Combinations and Scientific Impact , 2013 .

[12]  D. Hofstadter,et al.  Surfaces and Essences: Analogy as the Fuel and Fire of Thinking , 2013 .

[13]  William F. Ogburn,et al.  Are Inventions Inevitable? A Note on Social Evolution , 1922 .

[14]  D. J. Spiegelhalter,et al.  The future lies in uncertainty , 2014, Science.

[15]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[16]  S. Carpenter,et al.  Early-warning signals for critical transitions , 2009, Nature.

[17]  Karen Phalet,et al.  Becoming a group: value convergence and emergent work group identities. , 2014, The British journal of social psychology.

[18]  Jonathan Adams,et al.  Early citation counts correlate with accumulated impact , 2005, Scientometrics.

[19]  R. Tijssen,et al.  The discovery of ‘ introns ’ : analysis of the science-technology interface , 2013 .

[20]  Yue Chen,et al.  Towards an explanatory and computational theory of scientific discovery , 2009, J. Informetrics.

[21]  R. Merton The Matthew effect in science. The reward and communication systems of science are considered. , 1968, Science.

[22]  Luís M. A. Bettencourt,et al.  Scientific discovery and topological transitions in collaboration networks , 2009, J. Informetrics.

[23]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[24]  S. Moncada,et al.  Nitric oxide: physiology, pathophysiology, and pharmacology. , 1991, Pharmacological reviews.

[25]  Hein van Bohemen,et al.  Critical Transitions In Nature And Society, Princeton Studies in Complexity, M. Scheffer. Princeton University Press (2009), ISBN 0691122040, 30,95 US$ , 2010 .

[26]  Henk F. Moed,et al.  Handbook of Quantitative Science and Technology Research: The Use of Publication and Patent Statistics in Studies of S&T Systems , 2004 .

[27]  Loet Leydesdorff,et al.  Group‐based trajectory modeling (GBTM) of citations in scholarly literature: Dynamic qualities of “transient” and “sticky knowledge claims” , 2013, J. Assoc. Inf. Sci. Technol..

[28]  Amber Williams,et al.  Sleeping Beauties of Science. , 2015, Scientific American.

[29]  Jian Wang,et al.  Bias Against Novelty in Science: A Cautionary Tale for Users of Bibliometric Indicators , 2015 .

[30]  Diana Crane,et al.  Invisible colleges. Diffusion of knowledge in scientific communities , 1972, Medical History.

[31]  P. Andel Anatomy of the Unsought Finding. Serendipity: Orgin, History, Domains, Traditions, Appearances, Patterns and Programmability , 1994, The British Journal for the Philosophy of Science.

[32]  R. Perrucci,et al.  From Little Science to Big Science , 2017 .

[33]  ANTHONY F. J. VAN RAAN,et al.  Sleeping Beauties in science , 2004, Scientometrics.

[34]  Juan D. Rogers Citation analysis of nanotechnology at the field level: implications of R&D evaluation , 2010 .

[35]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[36]  L. M. A. Bettencourt,et al.  General Critical Properties of the Dynamics of Scientific Discovery , 2011 .

[37]  D. Cyranoski Korea's stem-cell stars dogged by suspicion of ethical breach , 2004, Nature.

[38]  Richard Van Noorden Science publishing: The trouble with retractions , 2011, Nature.

[39]  Benjamin F. Jones,et al.  Multi-University Research Teams: Shifting Impact, Geography, and Stratification in Science , 2008, Science.

[40]  J. Sung Embodied Anomaly Resolution in Molecular Genetics: A Case Study of RNAi , 2008 .

[41]  John Whitfield,et al.  Collaboration: Group theory , 2008, Nature.

[42]  Thanh Vũ Thị Cẩm Science, Technology and Innovation , 2017, PAM 2017.

[43]  Carlos Castillo-Chavez,et al.  Population modeling of the emergence and development of scientific fields , 2008, Scientometrics.

[44]  Daniel E Koshland The Cha-Cha-Cha Theory of Scientific Discovery , 2007, Science.

[45]  A. Bonaccorsi New Forms of Complementarity in Science , 2010 .

[46]  A. V. van Raan,et al.  Dormitory of Physical and Engineering Sciences: Sleeping Beauties May Be Sleeping Innovations , 2015, PloS one.

[47]  Anthony F. J. van Raan,et al.  Dormitory of Physical and Engineering Sciences: Sleeping Beauties May Be Sleeping Innovations. , 2015, 1506.01540.

[48]  Arturo Casadevall,et al.  Why Has the Number of Scientific Retractions Increased? , 2013, PloS one.

[49]  A. Pritchard,et al.  Statistical bibliography or bibliometrics , 1969 .

[50]  Ulrich Schmoch,et al.  Tracing the knowledge transfer from science to technology as reflected in patent indicators , 2005, Scientometrics.

[51]  P. G. Franck Technology in retrospect and critical events in science - A summary and critique of findings by IIT , 1969 .

[52]  S. Redner Citation statistics from 110 years of physical review , 2005, physics/0506056.

[53]  G. Moddel,et al.  Origin of excess heat generated during loading Pd-impregnated alumina powder with deuterium and hydrogen , 2012 .

[54]  Laurel L. Haak,et al.  Breakthrough Paper Indicator: early detection and measurement of ground-breaking research , 2012, CRIS.

[55]  Lotfi A. Zadeh,et al.  General System Theory , 1962 .

[56]  Andre K. Geim,et al.  Electric Field Effect in Atomically Thin Carbon Films , 2004, Science.

[57]  Hariolf Grupp,et al.  At the Crossroads in Laser Medicine and Polyimide Chemistry: Patent Assessment of the Expansion of Knowledge , 1992 .

[58]  Peter Wilhelm,et al.  Nobel Prize , 1964 .

[59]  D. Simonton Independent Discovery in Science and Technology: A Closer Look at the Poisson Distribution , 1978 .

[60]  Anthony F. J. van Raan,et al.  Theory‐changing breakthroughs in science: The impact of research teamwork on scientific discoveries , 2016, J. Assoc. Inf. Sci. Technol..

[61]  R. Merton Priorities in scientific discovery: A chapter in the sociology of science. , 1957 .

[62]  A. Ciechanover Tracing the history of the ubiquitin proteolytic system: the pioneering article. , 2009, Biochemical and biophysical research communications.

[63]  C. Clarke,et al.  The Sources of Invention , 1969 .

[64]  Paul F. Skilton Does the human capital of teams of natural science authors predict citation frequency? , 2009, Scientometrics.

[65]  Shawn J. Green,et al.  Key discoveries often originate with lone researchers , 2008, Nature.

[66]  Philip M. Davis,et al.  The persistence of error: a study of retracted articles on the Internet and in personal libraries. , 2012, Journal of the Medical Library Association : JMLA.

[67]  Robert J. W. Tijssen,et al.  Early stage identification of breakthroughs at the interface of science and technology: lessons drawn from a landmark publication , 2014, Scientometrics.

[68]  James Carifio,et al.  The Nature of Scientific Revolutions from the Vantage Point of Chaos Theory , 2005 .

[69]  A. Bonaccorsi Search Regimes and the Industrial Dynamics of Science , 2008 .

[70]  Barbara Gabriella Renzi Kuhn's Evolutionary Social Epistemology , 2013 .

[71]  B. Uzzi,et al.  Collaboration and Creativity: The Small World Problem1 , 2005, American Journal of Sociology.

[72]  P. Ball Critical Mass: How One Thing Leads to Another , 2004 .

[73]  Robert J. W. Tijssen,et al.  R&D dynamics and scientific breakthroughs in HIV/AIDS drugs development: the case of Integrase Inhibitors , 2014, Scientometrics.

[74]  A. Brannigan,et al.  Historical Distributions of Multiple Discoveries and Theories of Scientific Change , 1983 .

[75]  Andrew H. Wilson Science, technology and innovation , 2019, Africa Sustainable Development Report 2018.

[76]  Thilo Gross,et al.  Early Warning Signals for Critical Transitions: A Generalized Modeling Approach , 2011, PLoS Comput. Biol..

[77]  Richard Van Noorden,et al.  The top 100 papers , 2014, Nature.

[78]  Benjamin F. Jones,et al.  Supporting Online Material Materials and Methods Figs. S1 to S3 References the Increasing Dominance of Teams in Production of Knowledge , 2022 .

[79]  J. J. Winnink Early-stage detection of breakthrough-class scientific research : using micro-level citation dynamics , 2017 .

[80]  P. Wallace The Band Theory of Graphite , 1947 .

[81]  David I. Kaiser,et al.  Formation of Scientific Fields as a Universal Topological Transition , 2015 .

[82]  Robert J. W. Tijssen Discarding the 'basic science/applied science' dichotomy: A knowledge utilization triangle classification system of research journals , 2010, J. Assoc. Inf. Sci. Technol..

[83]  On the transmission dynamics of knowledge , 2005 .

[84]  G. Lewis,et al.  The . Matthew Effect 0 m Science The reward and communication systems of science are considered , 1999 .

[85]  Mark Gerstein,et al.  RNAi Development , 2007, PLoS Comput. Biol..

[86]  Raul Rodriguez-Esteban,et al.  Retraction rates are on the rise , 2008, EMBO reports.

[87]  Roger Guimerà,et al.  Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance , 2005, Science.

[88]  B. Tuckman DEVELOPMENTAL SEQUENCE IN SMALL GROUPS. , 1965, Psychological bulletin.

[89]  Ludo Waltman,et al.  A new methodology for constructing a publication-level classification system of science , 2012, J. Assoc. Inf. Sci. Technol..

[90]  Dean Keith Simonton,et al.  Multiple discovery and invention: Zeitgeist, genius, or chance? , 1979 .

[91]  Marcel Ausloos,et al.  Knowledge epidemics and population dynamics models for describing idea diffusion , 2012, ArXiv.

[92]  P. Barker The Cognitive Structure of Scientific Revolutions , 2006 .

[93]  Robert N. Broadus Toward a definition of “bibliometrics” , 1987, Scientometrics.

[94]  Rodrigo Costas,et al.  Identifying potential “breakthrough” publications using refined citation analyses: Three related explorative approaches , 2015, J. Assoc. Inf. Sci. Technol..

[95]  J A Grobler,et al.  Inhibitors of strand transfer that prevent integration and inhibit HIV-1 replication in cells. , 2000, Science.

[96]  M. Fleischmann,et al.  Electrochemically Induced Nuclear Fusion of Deuterium , 1989 .

[97]  H. Small A Co-Citation Model of a Scientific Specialty: A Longitudinal Study of Collagen Research , 1977 .