A topic models based framework for detecting and forecasting emerging technologies

Abstract The identification of emerging technologies can bring valuable intelligence to enterprises and countries determining research and development (R&D) priorities. Emerging technologies are closely related to emerging topics in terms of several well-documented attributes: relatively fast growth, radical novelty and prominent impact. Our previous work on detecting and forecasting emerging topics is adapted to measure technology emergence, but the dynamic influence model (DIM) is replaced by the topical n-grams (TNG) model in this framework to nominate several emerging technologies in technical terms and to exploit the potential of topic models. Hence, technologies are viewed as term-based themes in this study. Three indicators are designed to reflect the above attributes: the fast growth indicator, the radical novelty indicator and the prominent impact indicator. The relatively fast growth indicator is calculated from the results of the TNG model and the radical novelty indicator comes from the citation influence model (CIM). As for the prominent impact indicator, the involving authors are used after name disambiguation and credit allocation. The following fields are utilized to develop the models: title, abstract, keywords-author, publication year, byline information, and cited references. We participated in the 2018–2019 Measuring Tech Emergence Contest with the proposed method, and 8 out of 10 submitted ones met the contest organizer’s criteria of technology emergence. Criteria included the percentage of high growth terms out of total terms provided, the degree of growth of the terms, and the frequency of those high growth terms across the dataset. Then, a qualitative assessment of overall methodology was conducted by three judges. In the end, we won Second Prize in the contest.

[1]  Nils T. Hagen,et al.  Harmonic Allocation of Authorship Credit: Source-Level Correction of Bibliometric Bias Assures Accurate Publication and Citation Analysis , 2008, PloS one.

[2]  Gregorio González-Alcaide,et al.  Bibliometric indicators to identify emerging research fields: publications on mass gatherings , 2016, Scientometrics.

[3]  Ruimin Ma,et al.  Author bibliographic coupling analysis: A test based on a Chinese academic database , 2012, J. Informetrics.

[4]  Ludo Waltman,et al.  A new methodology for constructing a publication-level classification system of science , 2012, J. Assoc. Inf. Sci. Technol..

[5]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[6]  George S. Day,et al.  Wharton on Managing Emerging Technologies , 2000 .

[7]  Shuo Xu,et al.  Types of DOI errors of cited references in Web of Science with a cleaning method , 2019, Scientometrics.

[8]  Dejing Kong,et al.  Using the data mining method to assess the innovation gap: A case of industrial robotics in a catching-up country , 2017 .

[9]  Ge Cheng,et al.  Forecasting emerging technologies: A supervised learning approach through patent analysis , 2017 .

[10]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[11]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[12]  Chaomei Chen,et al.  CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature , 2006, J. Assoc. Inf. Sci. Technol..

[13]  Peder Olesen Larsen,et al.  Counting methods are decisive for rankings based on publication and citation studies , 2005, Scientometrics.

[14]  Nils T. Hagen,et al.  Harmonic coauthor credit: A parsimonious quantification of the byline hierarchy , 2013, J. Informetrics.

[15]  Andreas Strotmann,et al.  Evolution of research activities and intellectual influences in information science 1996-2005: Introducing author bibliographic-coupling analysis , 2008, J. Assoc. Inf. Sci. Technol..

[16]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[17]  B. C. Griffith,et al.  The Structure of Scientific Literatures I: Identifying and Graphing Specialties , 1974 .

[18]  Jinseok Kim,et al.  Rethinking the comparison of coauthorship credit allocation schemes , 2015, J. Informetrics.

[19]  Jinseok Kim,et al.  Evaluating author name disambiguation for digital libraries: a case of DBLP , 2018, Scientometrics.

[20]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[21]  Alan L. Porter,et al.  Discovering and forecasting interactions in big data research: A learning-enhanced bibliometric study , 2019, Technological Forecasting and Social Change.

[22]  Christian Weismayer,et al.  Identifying emerging research fields: a longitudinal latent semantic keyword analysis , 2017, Scientometrics.

[23]  Shuo Xu,et al.  A novel method for topic linkages between scientific publications and patents , 2019, J. Assoc. Inf. Sci. Technol..

[24]  Alan L. Porter,et al.  An indicator of technical emergence , 2018, Scientometrics.

[25]  Daniel Barbará,et al.  Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.

[26]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[27]  Douglas K. R. Robinson,et al.  Innovation pathways in additive manufacturing: Methods for tracing emerging and branching paths from rapid prototyping to alternative applications , 2019, Technological Forecasting and Social Change.

[28]  Arho Suominen,et al.  Modeling : Comparison of Unsupervised Learning and Human-Assigned Subject Classification , 2015 .

[29]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[30]  Alan L. Porter,et al.  Science overlay maps: A new tool for research policy and library management , 2009, J. Assoc. Inf. Sci. Technol..

[31]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[32]  Hongqi Han,et al.  Semantic fingerprints-based author name disambiguation in Chinese documents , 2017, Scientometrics.

[33]  Katy Börner,et al.  Mixed-indicators model for identifying emerging research areas , 2011, Scientometrics.

[34]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[35]  Alan L. Porter,et al.  Emergence scoring to identify frontier R&D topics and key players , 2019, Technological Forecasting and Social Change.

[36]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[37]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[38]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[39]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[40]  Kwangsoo Kim,et al.  Monitoring emerging technologies for technology planning using technical keyword based analysis from patent data , 2017 .

[41]  Shuo Xu,et al.  A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature , 2015, Journal of Cheminformatics.

[42]  Alan L. Porter,et al.  Clustering scientific documents with topic modeling , 2014, Scientometrics.

[43]  António Osório On the impossibility of a perfect counting method to allocate the credits of multi-authored publications , 2018, Scientometrics.

[44]  Nees Jan van Eck,et al.  Large scale author name disambiguation using rule-based scoring and clustering , 2014 .

[45]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[46]  George Wright,et al.  Use of expert knowledge to anticipate the future: Issues, analysis and directions , 2017 .

[47]  Blaise Cronin,et al.  Hyperauthorship: A postmodern perversion or evidence of a structural shift in scholarly communication practices? , 2001, J. Assoc. Inf. Sci. Technol..

[48]  Yoshiyuki Takeda,et al.  Optics: a bibliometric approach to detect emerging research domains and intellectual bases , 2009, Scientometrics.

[49]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[50]  Bo Jarneving,et al.  Bibliographic coupling and its application to research-front and other core documents , 2007, J. Informetrics.

[51]  Qi Wang,et al.  A bibliometric model for identifying emerging research topics , 2017, J. Assoc. Inf. Sci. Technol..

[52]  Zheng Wang,et al.  Overlapping thematic structures extraction with mixed-membership stochastic blockmodel , 2018, Scientometrics.

[53]  Reinhilde Veugelers,et al.  Scientific novelty and technological impact , 2019, Research Policy.

[54]  Min Song,et al.  Author credit‐assignment schemas: A comparison and analysis , 2016, J. Assoc. Inf. Sci. Technol..

[55]  Shuo Xu,et al.  A Shared Interest Discovery Model for Coauthor Relationship in SNS , 2014, Int. J. Distributed Sens. Networks.

[56]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[57]  Alan L. Porter,et al.  Tracing the system transformations and innovation pathways of an emerging technology: Solid lipid nanoparticles , 2019, Technological Forecasting and Social Change.

[58]  Kevin W. Boyack,et al.  Characterizing the emergence of two nanotechnology topics using a contemporaneous global micro-model of science , 2014 .

[59]  Wolfgang Glänzel,et al.  Using ‘core documents’ for detecting and labelling new emerging topics , 2011, Scientometrics.

[60]  Alan L. Porter,et al.  Technology roadmapping for competitive technical intelligence , 2016 .

[61]  Munan Li,et al.  An exploration to visualise the emerging trends of technology foresight based on an improved technique of co-word analysis and relevant literature data of WOS , 2017, Technol. Anal. Strateg. Manag..

[62]  Shuo Xu,et al.  A deep learning based method for extracting semantic information from patent documents , 2020, Scientometrics.

[63]  Daniel Jurafsky,et al.  Citation-based bootstrapping for large-scale author disambiguation , 2012, J. Assoc. Inf. Sci. Technol..

[64]  Xiangrong Liu,et al.  Development of an in vivo computer for 3-SAT problem , 2009, 2009 Fourth International on Conference on Bio-Inspired Computing.

[65]  Saku J. Mäkinen,et al.  A method for anticipating the disruptive nature of digitalization in the machine-building industry , 2019, Technological Forecasting and Social Change.

[66]  Kevin W. Boyack,et al.  Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? , 2010, J. Assoc. Inf. Sci. Technol..

[67]  Alan L. Porter,et al.  Analysing the theoretical roots of technology emergence: an evolutionary perspective , 2019, Scientometrics.

[68]  Daniele Rotolo,et al.  Emerging Technology , 2001 .

[69]  Yoshiyuki Takeda,et al.  Detecting emerging research fronts based on topological measures in citation networks of scientific publications , 2008 .

[70]  Shuo Xu,et al.  Emerging research topics detection with multiple machine learning models , 2019, J. Informetrics.

[71]  Oh-Jin Kwon,et al.  Early identification of emerging technologies: A machine learning approach using multiple patent indicators , 2018 .

[72]  Jan L. Youtie,et al.  Entry strategies in an emerging technology: a pilot web-based study of graphene firms , 2013, Scientometrics.

[73]  Mu-Hsuan Huang,et al.  Detecting research fronts in OLED field using bibliographic coupling with sliding window , 2013, Scientometrics.

[74]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[75]  Shuo Xu,et al.  Review on emerging research topics with key-route main path analysis , 2019, Scientometrics.

[76]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[77]  A. Bonaccorsi,et al.  Expert forecast and realized outcomes in technology foresight , 2019, Technological Forecasting and Social Change.

[78]  Zheng Wang,et al.  Semantic relation extraction aware of N-gram features from unstructured biomedical text , 2018, J. Biomed. Informatics.

[79]  C. J. van Rijsbergen,et al.  Investigating the relationship between language model perplexity and IR precision-recall measures , 2003, SIGIR.

[80]  H. Small,et al.  Identifying emerging topics in science and technology , 2014 .

[81]  Qing Ke Technological impact of biomedical research: the role of basicness and novelty , 2020, Research Policy.

[82]  M. Hochberg,et al.  Author Sequence and Credit for Contributions in Multiauthored Publications , 2007, PLoS biology.