Measuring textual patent similarity on the basis of combined concepts: design decisions and their consequences

For certain tasks in patent management it makes sense to apply a quantitative measure of textual similarity between patents and/or parts thereof: be it the analysis of freedom to operate, the analysis of technology convergence, or the mapping of patents for strategic purposes. In this paper we intend to outline the process of measuring textual patent similarity on the basis of elements referred to as ‘combined concepts’. We are going to use this process in various operations leading to design decisions, and shall also provide guidance regarding these decisions. By way of two applications from patent management, namely the prioritization of patents and the analysis of convergence between two technological fields, we mean to demonstrate the crucial importance of design decisions in terms of patent analysis results.

[1]  James A. Danowski,et al.  Automatic Mapping of Social Networks of Political Actors from Large Collections of News Stories , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  Martin G. Moehrle,et al.  Patinformatics as a business process: A guideline through patent research tasks and tools , 2010 .

[4]  Arie Rip,et al.  Co-word maps of biotechnology: An example of cognitive scientometrics , 1984, Scientometrics.

[5]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[6]  Anthony J. Trippe,et al.  Patinformatics: Tasks to tools , 2003 .

[7]  Jens Leker,et al.  Seeing the next iphone coming your way: How to anticipate converging industries , 2009, PICMET '09 - 2009 Portland International Conference on Management of Engineering & Technology.

[8]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[9]  H. P. F. Peters,et al.  Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling , 1993 .

[10]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[11]  Andy Gibbs,et al.  Advanced document retrieval techniques for patent research , 2008 .

[12]  J. Sepkoski,et al.  Quantified coefficients of association and measurement of similarity , 1974 .

[13]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[14]  Peter Collins,et al.  English grammar : an introduction , 2000 .

[15]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[16]  Thorsten Teichert,et al.  Inventive progress measured by multi-stage patent citation analysis , 2005 .

[17]  Hyunbo Cho,et al.  A novel method for measuring semantic similarity for XML schema matching , 2008, Expert Syst. Appl..

[18]  J. Leker,et al.  Patent indicators for monitoring convergence - examples from NFF and ICT , 2011 .

[19]  Yiannis Kompatsiaris,et al.  Towards content-oriented patent document processing , 2008 .

[20]  V. Batagelj,et al.  Comparing resemblance measures , 1995 .

[21]  Fulvio Corno,et al.  Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics , 2010 .

[22]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[23]  Christian Sternitzke,et al.  Similarity measures for document mapping: A comparative study on the level of an individual scientist , 2007, Scientometrics.

[24]  M. Trajtenberg A Penny for Your Quotes : Patent Citations and the Value of Innovations , 1990 .

[25]  Kathleen M. Carley Extracting team mental models through textual analysis , 1997 .

[26]  Davide Russo,et al.  Computer-aided analysis of patents and search for TRIZ contradictions , 2007 .

[27]  J. Leker,et al.  Anticipating converging industries using publicly available data , 2010 .

[28]  D. Steinley Journal of Classification , 2004, Vegetatio.

[29]  Henk F. Moed,et al.  Mapping of Science : Critical elaboration and new approaches, a case study in agricultural biochemistry , 1988 .

[30]  Lijun Jiang,et al.  Ontology-Based Similarity Between Text Documents on Manifold , 2006, ASWC.

[31]  Martin G. Moehrle Measures for textual patent similarities: a guided way to select appropriate approaches , 2010, Scientometrics.

[32]  George A. Barnett,et al.  The Use of CATPAC for Text Analysis , 1996 .

[33]  Sang-Chan Park,et al.  Visualization of patent analysis for emerging technology , 2008, Expert Syst. Appl..

[34]  Grzegorz Kondrak,et al.  N-Gram Similarity and Distance , 2005, SPIRE.

[35]  Steven R. Corman,et al.  Studying Complex Discursive Systems: Centering Resonance Analysis of Communication. , 2002 .

[36]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[37]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[38]  Jian Qin,et al.  Semantic similarities between a keyword database and a controlled vocabulary database: An investigation in the antibiotic resistance literature , 2000, J. Am. Soc. Inf. Sci..

[39]  Thomas Klose,et al.  Text mining and visualization tools - Impressions of emerging capabilities , 2008 .

[40]  Sungjoo Lee,et al.  An approach to discovering new technology opportunities: Keyword-based patent map approach , 2009 .

[41]  Tunga Güngör,et al.  Time-efficient spam e-mail filtering using n-gram models , 2008, Pattern Recognit. Lett..

[42]  Leo Egghe,et al.  The Distribution of N-Grams , 2000, Scientometrics.