Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology

An extended latent Dirichlet allocation (LDA) model is presented in this paper for patent competitive intelligence analysis. After part-of-speech tagging and defining the noun phrase extraction rules, technological words have been extracted from patent titles and abstracts. This allows us to go one step further and perform patent analysis at content level. Then LDA model is used for identifying underlying topic structures based on latent relationships of technological words extracted. This helped us to review research hot spots and directions in subclasses of patented technology in a certain field. For the extension of the traditional LDA model, another institution-topic probability level is added to the original LDA model. Direct competing enterprises’ distribution probability and their technological positions are identified in each topic. Then a case study is carried on within one of the core patented technology in next generation telecommunication technology-LTE. This empirical study reveals emerging hot spots of LTE technology, and finds that major companies in this field have been focused on different technological fields with different competitive positions.

[1]  Jacques Savoy,et al.  Authorship attribution based on a probabilistic topic model , 2013, Inf. Process. Manag..

[2]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[3]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[4]  Woo Hyoung Lee,et al.  How to identify emerging research fields using scientometrics: An example in the field of Information Security , 2008, Scientometrics.

[5]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[6]  Ying Ding,et al.  Topic-based PageRank on author cocitation networks , 2011, J. Assoc. Inf. Sci. Technol..

[7]  Chaomei Chen,et al.  The differences between latent topics in abstracts and citation contexts of citing papers , 2013, J. Assoc. Inf. Sci. Technol..

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Anthony F. Breitzman,et al.  The many applications of patent analysis , 2002, J. Inf. Sci..

[10]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[11]  Anthony Breitzman,et al.  Using Patent Citation Analysis to Target/Value M&A Candidates , 2002 .

[12]  J. Qiu,et al.  Finding Complex Biological Relationships in Recent PubMed Articles Using Bio-LDA , 2011, PloS one.

[13]  Kwangsoo Kim,et al.  A patent intelligence system for strategic technology planning , 2013, Expert Syst. Appl..

[14]  Cassidy R. Sugimoto,et al.  The shifting sands of disciplinary development: Analyzing North American Library and Information Science dissertations using latent Dirichlet allocation , 2011, J. Assoc. Inf. Sci. Technol..

[15]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[16]  G. Loukidis,et al.  SIAM International Conference on Data Mining (SDM) , 2015 .

[17]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[18]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[19]  Holger Ernst,et al.  The Use of Patent Data for Technological Forecasting: The Diffusion of CNC-Technology in the Machine Tool Industry , 1997 .

[20]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[21]  Holger Ernst,et al.  Patent information for strategic technology management , 2003 .

[22]  Chia-Yon Chen,et al.  Technology forecasting and patent strategy of hydrogen energy and fuel cell technologies , 2011 .

[23]  Cassidy R. Sugimoto,et al.  The cognitive structure of Library and Information Science: Analysis of article title words , 2011, J. Assoc. Inf. Sci. Technol..

[24]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[25]  Teng Li Derwent Manual Code Co-Occurrence: A Practical Method in Patent Map , 2012 .

[26]  Jean Pierre Courtial,et al.  The use of patent titles for identifying the topics of invention and forecasting trends , 1993, Scientometrics.

[27]  P. Thompson,et al.  Patent Citations and the Geography of Knowledge Spillovers: A Reassessment , 2005 .

[28]  Duen-Ren Liu,et al.  Mining Changes in Patent Trends for Competitive Intelligence , 2008, PAKDD.

[29]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[30]  Katharina Maria Hofer,et al.  Conference proceedings as a matter of bibliometric studies: the Academy of International Business 2006–2008 , 2010, Scientometrics.

[31]  Masatsura Igami,et al.  Exploration of the evolution of nanotechnology via mapping of patent applications , 2008, Scientometrics.

[32]  Shang Jyh Liu,et al.  Strategic planning for technology development with patent analysis , 1997 .

[33]  Kwangsoo Kim,et al.  Invention property-function network analysis of patents: a case of silicon-based thin film solar cells , 2011, Scientometrics.

[34]  Qingqiang Wu,et al.  Co-word analysis of the trends in stem cells field based on subject heading weighting , 2011, Scientometrics.

[35]  Keh-Yih Su,et al.  Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13 , 2000 .

[36]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  John Whittaker,et al.  Creativity and Conformity in Science: Titles, Keywords and Co-word Analysis , 1989 .

[38]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[39]  E. C. Engelsman,et al.  A patent-based cartography of technology , 1994 .

[40]  Gobinda G. Chowdhury,et al.  Bibliometric cartography of information retrieval research by using co-word analysis , 2001, Inf. Process. Manag..

[41]  F. Song,et al.  Mapping the Knowledge Structure of Research on Patient Adherence: Knowledge Domain Visualization Based Co-Word Analysis and Social Network Analysis , 2012, PloS one.

[42]  Jin Zhang,et al.  Visualization of health-subject analysis based on query term co-occurrences , 2008 .

[43]  Joemon M. Jose,et al.  Text segmentation: A topic modeling perspective , 2011, Inf. Process. Manag..

[44]  Zainal A. Hasibuan,et al.  Identification of technology trend on Indonesian patent documents and research reports on chemistry and metallurgy fields , 2006 .

[45]  Duen-Ren Liu,et al.  Discovering competitive intelligence by mining changes in patent trends , 2010, Expert Syst. Appl..

[46]  Lawrence Carin,et al.  Hierarchical Bayesian Modeling of Topics in Time-Stamped Documents , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Thorsten Teichert,et al.  Inventive progress measured by multi-stage patent citation analysis , 2005 .

[48]  Qingqiang Wu,et al.  Topic segmentation model based on ATNLDA and co-occurrence theory and its application in stem cell field , 2013, J. Inf. Sci..

[49]  Loet Leydesdorff Why words and co‐words cannot map the development of the sciences , 1997 .

[50]  Sungjoo Lee,et al.  Using Patent Information for Designing New Product and Technology: Keyword Based Technology Roadmapping , 2008 .