Two scalable algorithms for associative text classification

Associative classification methods have been recently applied to various categorization tasks due to its simplicity and high accuracy. To improve the coverage for test documents and to raise classification accuracy, some associative classifiers generate a huge number of association rules during the mining step. We present two algorithms to increase the computational efficiency of associative classification: one to store rules very efficiently, and the other to increase the speed of rule matching, using all of the generated rules. Empirical results using three large-scale text collections demonstrate that the proposed algorithms increase the feasibility of applying associative classification to large-scale problems.

[1]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[3]  K. Anuradha,et al.  An improved Approach for Document Retrieval Using Suffix Trees , 2011 .

[4]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[6]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[7]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[8]  Ke Wang,et al.  Frequent-subsequence-based prediction of outer membrane proteins , 2003, KDD '03.

[9]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Ashwin Ram,et al.  Adapting associative classification to text categorization , 2007, DocEng '07.

[11]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[12]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[13]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[14]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[16]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[17]  Gary Geunbae Lee,et al.  Text Categorization Based on Boosting Association Rules , 2008, 2008 IEEE International Conference on Semantic Computing.