Pattern-oriented associative rule-based patent classification

This paper proposes an innovative pattern-oriented associative rule-based approach to construct automatic TRIZ-based patent classification system. Derived from associative rule-based text categorization, the new approach does not only discover the semantic relationship among features in a document by their co-occurrence, but also captures the syntactic information by manually generalized patterns. We choose 7 classes which address 20 of the 40 TRIZ Principles and perform experiments upon the binary set for each class. Compared with three currently popular classification algorithms (SVM, C4.5 and NB), the new approach shows some improvement. More importantly, this new approach has its own advantages, which were discussed in this paper as well.

[1]  Worapoj Kreesuradej,et al.  A new association rule-based text classifier algorithm , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  P. J. Hayes,et al.  Adding value to financial news by computer , 1991, Proceedings First International Conference on Artificial Intelligence Applications on Wall Street.

[4]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[8]  H.T. Loh,et al.  Domain-specific concept-based information retrieval system , 2004, 2004 IEEE International Engineering Management Conference (IEEE Cat. No.04CH37574).

[9]  Philip J. Hayes,et al.  TCS: a shell for content-based text categorization , 1990, Sixth Conference on Artificial Intelligence for Applications.

[10]  Han Tong Loh,et al.  Automatic classification of patent documents for TRIZ users , 2006 .

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[13]  Liang Chen,et al.  A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties , 2003, ACL 2003.

[14]  Norbert Fuhr,et al.  AIR/X - A rule-based multistage indexing system for Iarge subject fields , 1991, RIAO.

[15]  Osmar R. Zaïane,et al.  Classifying Text Documents by Associating Terms With Text Categories , 2002, Australasian Database Conference.

[16]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[17]  Loh Han Tong,et al.  Grouping of TRIZ Inventive Principles to facilitate automatic patent classification , 2008 .

[18]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[19]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[20]  Lei Wang,et al.  Text categorization based on frequent patterns with term frequency , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[21]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[22]  Man Lan,et al.  A comparative study on term weighting schemes for text categorization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[23]  Hui Han,et al.  Rule-based word clustering for text classification , 2003, SIGIR '03.

[24]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[25]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[26]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.