Measuring patent similarity with SAO semantic analysis

Patents are not only an important aspect of intellectual property rights, but they are also one of the only ways to protect technological inventions. However, in recent years, the number of patents has been increasing dramatically and, as a result, both patent applicants and patent examiners are finding it more difficult to conduct the due diligence step of the patent registration process. Therefore, the lack of a quick and easy way to accurately measure patent similarity has become a significant obstacle to protecting intellectual property. Currently, there are three main ways to measure patent similarity: IPC code analysis, citation analysis, and keyword analysis. None of these approaches are able to fully reflect the semantics in a patent’s content. As an emerging methodology, subject–action–object (SAO) semantic analysis does reflect semantics, but most approaches treat each identified relationship as equally important, which does not necessarily provide an accurate measure of patent similarity. To offer this power to SAO analysis, this article introduces a new indicator called DWSAO as a reflection of the weight of each SAO semantic structure. Further, we present a semantic analysis framework that incorporates the DWSAO index for finding similar patents based on the weight of each SAO structure in the patent. A case study on the similarity of patents in the field of robotics was used to verify the reliability of the method. The results highlight the detailed meanings derived from the method, the accuracy of the outcomes, and the practical significance of using this approach.

[1]  Christian Sternitzke,et al.  Similarity measures for document mapping: A comparative study on the level of an individual scientist , 2007, Scientometrics.

[2]  Marcelo Fiszman,et al.  Extracting Semantic Predications from Medline Citations for Pharmacogenomics , 2006, Pacific Symposium on Biocomputing.

[3]  Alan L. Porter,et al.  A hybrid similarity measure method for patent portfolio analysis , 2016, J. Informetrics.

[4]  Halil Kilicoglu,et al.  Adapting semantic natural language processing technology to address information overload in influenza epidemic management , 2010 .

[5]  Alan L. Porter,et al.  How to combine term clumping and technology roadmapping for newly emerging science & technology competitive intelligence: “problem & solution” pattern based semantic TRIZ tool and case study , 2014, Scientometrics.

[6]  Bart Van Looy,et al.  Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications , 2009, Scientometrics.

[7]  Martin G. Moehrle,et al.  How Combinations of TRIZ Tools are Used in Companies - Results of a Cluster Analysis , 2005 .

[8]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[9]  Kwangsoo Kim,et al.  Identification and evaluation of corporations for merger and acquisition strategies using patent information and text mining , 2013, Scientometrics.

[10]  Mark A. Finlayson Java Libraries for Accessing the Princeton Wordnet: Comparison and Evaluation , 2014, GWC.

[11]  Kevin W. Boyack,et al.  Mapping the backbone of science , 2004, Scientometrics.

[12]  Byungun Yoon,et al.  On the development of a technology intelligence tool for identifying technology opportunity , 2008, Expert Syst. Appl..

[13]  Martin G. Moehrle,et al.  Evaluating the Risk of Patent Infringement by Means of Semantic Patent Analysis: The Case of DNA Chips , 2008 .

[14]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[15]  Kwangsoo Kim,et al.  A patent intelligence system for strategic technology planning , 2013, Expert Syst. Appl..

[16]  Bart Van Looy,et al.  Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications , 2010, Scientometrics.

[17]  John C. Henderson,et al.  MITRE: Seven Systems for Semantic Similarity in Tweets , 2015, *SEMEVAL.

[18]  Kwangsoo Kim,et al.  Detecting signals of new technological opportunities using semantic patent analysis and outlier detection , 2011, Scientometrics.

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Alan L. Porter,et al.  Combining SAO semantic analysis and morphology analysis to identify technology opportunities , 2017, Scientometrics.

[21]  Kwangsoo Kim,et al.  Identifying technological competition trends for R&D planning using dynamic patent maps: SAO-based content analysis , 2012, Scientometrics.

[22]  Alan L. Porter,et al.  Identification of technology development trends based on subject–action–object analysis: The case of dye-sensitized solar cells , 2015 .

[23]  Kwangsoo Kim,et al.  Identifying patent infringement using SAO based semantic technological similarities , 2011, Scientometrics.

[24]  Byungun Yoon,et al.  A text-mining-based patent network: Analytical tool for high-technology trend , 2004 .

[25]  Sung-Hyon Myaeng,et al.  Automatic discovery of technology trends from patent text , 2009, SAC '09.

[26]  Henk F. Moed,et al.  Mapping of Science : Critical elaboration and new approaches, a case study in agricultural biochemistry , 1988 .

[27]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[28]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[29]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[30]  Inchae Park,et al.  A semantic analysis approach for identifying patent infringement based on a product–patent map , 2014, Technol. Anal. Strateg. Manag..

[31]  Stephen R. Adams Information Sources in Patents , 2005 .

[32]  Martin G. Moehrle Measures for textual patent similarities: a guided way to select appropriate approaches , 2010, Scientometrics.