New multi-stage similarity measure for calculation of pairwise patent similarity in a patent citation network

Being able to effectively measure similarity between patents in a complex patent citation network is a crucial task in understanding patent relatedness. In the past, techniques such as text mining and keyword analysis have been applied for patent similarity calculation. The drawback of these approaches is that they depend on word choice and writing style of authors. Most existing graph-based approaches use common neighbor-based measures, which only consider direct adjacency. In this work we propose new similarity measures for patents in a patent citation network using only the patent citation network structure. The proposed similarity measures leverage direct and indirect co-citation links between patents. A challenge is when some patents receive a large number of citations, thus are considered more similar to many other patents in the patent citation network. To overcome this challenge, we propose a normalization technique to account for the case where some pairs are ranked very similar to each other because they both are cited by many other patents. We validate our proposed similarity measures using US class codes for US patents and the well-known Jaccard similarity index. Experiments show that the proposed methods perform well when compared to the Jaccard similarity index.

[1]  Luciano da Fontoura Costa,et al.  Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts , 2012, ArXiv.

[2]  Yongtae Park,et al.  Trajectory patterns of technology fusion: Trend analysis and taxonomical grouping in nanobiotechnology , 2010 .

[3]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[4]  Paul Nicholls,et al.  Introduction to informetrics: Quantitative methods in library, documentation and information science , 1991 .

[5]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[6]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[7]  Gamal Atallah,et al.  Indirect patent citations , 2006, Scientometrics.

[8]  Hsiao-Chun Wu,et al.  A method for assessing patent similarity using direct and indirect citation links , 2010, 2010 IEEE International Conference on Industrial Engineering and Engineering Management.

[9]  Devi R. Gnyawali,et al.  Co-opetition between giants: Collaboration with competitors for technological innovation , 2011 .

[10]  Luciano da Fontoura Costa,et al.  On the use of topological features and hierarchical characterization for disambiguating names in collaborative networks , 2012, ArXiv.

[11]  Yi-Cheng Zhang,et al.  Influence, originality and similarity in directed acyclic graphs , 2011, ArXiv.

[12]  Francis Narin,et al.  Patent bibliometrics , 2005, Scientometrics.

[13]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[14]  Thorsten Teichert,et al.  Inventive progress measured by multi-stage patent citation analysis , 2005 .

[15]  Bernard Gress,et al.  Properties of the USPTO patent citation network: 1963-2002 , 2010 .

[16]  Lawrence B. Holder,et al.  Mining Graph Data: Cook/Mining Graph Data , 2006 .

[17]  Yan Lin,et al.  Backbone of technology evolution in the modern era automobile industry: An analysis by the patents citation network , 2011 .

[18]  Myong Kee Jeong,et al.  Inter-cluster connectivity analysis for technology opportunity discovery , 2014, Scientometrics.

[19]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[20]  Wonjoon Kim,et al.  Dynamic patterns of technological convergence in printed electronics technologies: patent citation network , 2013, Scientometrics.

[21]  Martin G. Moehrle,et al.  Measuring textual patent similarity on the basis of combined concepts: design decisions and their consequences , 2012, Scientometrics.

[22]  Byungun Yoon,et al.  A text-mining-based patent network: Analytical tool for high-technology trend , 2004 .

[23]  F. Malerba,et al.  Knowledge-relatedness in firm technological diversification , 2003 .

[24]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[25]  Leo Egghe,et al.  Co-citation, bibliographic coupling and a characterization of lattice citation networks , 2002, Scientometrics.

[26]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[27]  A. John MINING GRAPH DATA , 2022 .

[28]  唐翌,et al.  Link prediction based on a semi-local similarity index , 2011 .

[29]  Gaetano Cascini,et al.  Measuring patent similarity by comparing inventions functional trees , 2008, IFIP CAI.

[30]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[31]  Mark Newman,et al.  Networks: An Introduction , 2010 .