Technology matching of the patent documents using clustering algorithms

This paper analyzes the accuracy of different clustering algorithms to handle different parts of the patent documents. The algorithms are part of the software package which is used as a tool for business intelligence purposes. The tool assembles patent data from publicly available data bases, collects and analyzes patents bibliographic parameters and performs text mining. Performances of clustering algorithms: k-means, the neural-gas; fuzzy c-means and ronn algorithm are examined when run on different parts of the patent document, such as abstract, claim, international patent code description and detailed patent description, but applied on the same patent data set. Patent data set was previously classified in technology groups by the experts and obtained results are compared with the purpose of selection of the most suitable algorithm and patent document part.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Bruno van Pottelsberghe de la Potterie,et al.  Academic versus industry patenting: An in-depth analysis of what determines patent value , 2006 .

[3]  Dragan Kukolj,et al.  Effectiveness of text processing in patent documents visualization , 2013, 2013 IEEE 11th International Symposium on Intelligent Systems and Informatics (SISY).

[4]  M. Reitzig What determines patent value?: Insights from the semiconductor industry , 2003 .

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Mark A. Lemley,et al.  Extreme Value or Trolls on Top? The Characteristics of the Most Litigated Patents , 2009 .

[7]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[8]  Mark A. Schankerman,et al.  Stylised Fact of Patent Litigation: Value, Scope and Ownership , 1997 .

[9]  Bronwyn H Hall,et al.  Market value and patent citations , 2005 .

[10]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[11]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[13]  Dragan Kukolj,et al.  Comparison of algorithms for patent documents clusterization , 2012, 2012 Proceedings of the 35th International Convention MIPRO.

[14]  Scott Shane,et al.  Technological Opportunities and New Firm Creation , 2001, Manag. Sci..

[15]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[16]  Kas Kasravi,et al.  Patent Mining - Discover y of Business Value from Patent Repositor ies , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[17]  Mark A. Schankerman,et al.  Patent Quality and Research Productivity: Measuring Innovation with Multiple Indicators , 2004 .

[18]  Dominique Guellec,et al.  The Value of Patents and Patenting Strategies: Countries and Technology Areas Patterns , 2002 .

[19]  José García Rodríguez,et al.  Automatic Landmarking of 2D Medical Shapes Using the Growing Neural Gas Network , 2005, CVBIA.

[20]  Qiang Du,et al.  Convergence of the Lloyd Algorithm for Computing Centroidal Voronoi Tessellations , 2006, SIAM J. Numer. Anal..

[21]  Mark A. Schankerman,et al.  Characteristics of patent litigation: a window on competition , 2001 .

[22]  J. Lerner The Importance of Patent Scope: An Empirical Analysis , 1994 .

[23]  Z. Griliches,et al.  Citations, Family Size, Opposition and the Value of Patent Rights Have Profited from Comments and Suggestions , 2002 .

[24]  Dragan Kukolj,et al.  PSALM - Tool for business intelligence , 2012, 2012 Proceedings of the 35th International Convention MIPRO.

[25]  M. Reitzig Improving patent valuations for management purposes--validating new indicators by analyzing application rationales , 2004 .

[26]  Dragan Kukolj,et al.  DATA CLUSTERING USING A REORGANIZING NEURAL NETWORK , 2006, Cybern. Syst..

[27]  D. Harhoff,et al.  The Value of European Patents , 2008 .