An improved patent similarity measurement based on entities and semantic relations

Abstract Patent similarity measurement, as one of the fundamental building blocks for patent analysis, is able to derive technical intelligence efficiently, but also can detect the risk of infringement and evaluate whether the invention meets the criteria of novelty and innovation. However, traditional approaches make implicitly several assumptions, such as bag of words in each component, semantic direction irrelevance and so on. In order to relax these assumptions, this study proposes an improved methodology on the basis of entities and semantic relations (functional and non-functional relations), which takes semantic direction of each sequence structure and the word order information of each component into consideration. Meanwhile, an algorithm for calculating the global importance of each sequence structure is put forward. Finally, to verify the effectiveness and performance of the improved semantic analysis, a case study is conducted on the thin film head subfield in the field of hard disk drive. Extensive experimental results show that our approach is significantly more accurate.

[1]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Yun Chen,et al.  Measuring patent similarity with SAO semantic analysis , 2019, Scientometrics.

[4]  Shuo Xu,et al.  A novel method for topic linkages between scientific publications and patents , 2019, J. Assoc. Inf. Sci. Technol..

[5]  Lijun Zhu,et al.  A Novel Approach for Measuring Chinese Terms Semantic Similarity Based on Pairwise Sequence Alignment , 2009, 2009 Fifth International Conference on Semantics, Knowledge and Grid.

[6]  Kwangsoo Kim,et al.  Identifying patent infringement using SAO based semantic technological similarities , 2011, Scientometrics.

[7]  Sungjoo Lee,et al.  Deriving technology intelligence from patents: Preposition-based semantic analysis , 2018, J. Informetrics.

[8]  Shuo Xu,et al.  A deep learning based method for extracting semantic information from patent documents , 2020, Scientometrics.

[9]  Minghong Chen,et al.  Study on Early Warning of Competitive Technical Intelligence Based on the Patent Map , 2010, J. Comput..

[10]  Mausam,et al.  Open Information Extraction from Conjunctive Sentences , 2018, COLING.

[11]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[12]  Kwangsoo Kim,et al.  Identification of promising patents for technology transfers using TRIZ evolution trends , 2013, Expert Syst. Appl..

[13]  Samee U. Khan,et al.  A literature review on the state-of-the-art in patent analysis , 2014 .

[14]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[15]  Zheng Wang,et al.  Overlapping thematic structures extraction with mixed-membership stochastic blockmodel , 2018, Scientometrics.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Semyon Savransky,et al.  Engineering of Creativity: Introduction to TRIZ Methodology of Inventive Problem Solving , 2000 .

[18]  Dongwoo Kang,et al.  An SAO-based text mining approach to building a technology tree for technology planning , 2012, Expert Syst. Appl..

[19]  Kuei-Kuei Lai,et al.  Using the patent co-citation approach to establish a new patent classification system , 2005, Inf. Process. Manag..

[20]  S. Arts,et al.  Text matching to measure patent similarity , 2018 .

[21]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[22]  Kwangsoo Kim,et al.  Identifying technological competition trends for R&D planning using dynamic patent maps: SAO-based content analysis , 2012, Scientometrics.

[23]  Myong Kee Jeong,et al.  New multi-stage similarity measure for calculation of pairwise patent similarity in a patent citation network , 2015, Scientometrics.

[24]  Kwangsoo Kim,et al.  Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks , 2011, Scientometrics.

[25]  Kwangsoo Kim,et al.  A patent intelligence system for strategic technology planning , 2013, Expert Syst. Appl..

[26]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[27]  Tiejun Zhao,et al.  A delimiter-based general approach for Chinese term extraction , 2010, J. Assoc. Inf. Sci. Technol..

[28]  Gaetano Cascini,et al.  Measuring patent similarity by comparing inventions functional trees , 2008, IFIP CAI.

[29]  Jae Yeol Lee,et al.  An SAO‐Based Text‐Mining Approach for Technology Roadmapping Using Patent Information , 2013 .

[30]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[31]  Martin G. Moehrle,et al.  Evaluating the Risk of Patent Infringement by Means of Semantic Patent Analysis: The Case of DNA Chips , 2008 .

[32]  Kwangsoo Kim,et al.  Detecting signals of new technological opportunities using semantic patent analysis and outlier detection , 2011, Scientometrics.

[33]  Alan L. Porter,et al.  A hybrid similarity measure method for patent portfolio analysis , 2016, J. Informetrics.

[34]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.