A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation

Associative-classification is a promising classification method based on association-rule mining. Significant amount of work has already been dedicated to the process of building a classifier based on association rules. However, relatively small amount of research has been performed in association-rule mining from multi-label data. In such data each example can belong, and thus should be classified, to more than one class. This paper aims at the most demanding, with respect to computational cost, part in associative-classification, which is efficient generation of association rules. This task can be achieved using different frequent pattern mining methods. In this paper, we propose a new method that is based on the state-of-the-art tree-projection-based frequent pattern mining algorithm. This algorithm is modified to improve its efficiency and extended to accommodate the multi-label recurrent-item associative-classification rule generation. The proposed algorithm is tested and compared with A priori-based associative-classification rule generator on two large datasets.

[1]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[4]  Ee-Peng Lim,et al.  Mining Multi-Level Rules with Recurrent Items using FP'-Tree , 2001 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  J. Ross Quinlan Learning First-Order Definitions of Functions , 1996, J. Artif. Intell. Res..

[7]  David A. Padua,et al.  A sampling-based framework for parallel data mining , 2005, PPoPP.

[8]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[9]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[10]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[11]  Tao Qin,et al.  Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[13]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[14]  Osmar R. Zaïane,et al.  Classifying Text Documents by Associating Terms With Text Categories , 2002, Australasian Database Conference.

[15]  Jeffrey F. Naughton,et al.  Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data , 1980, SIGMOD 2000.

[16]  Peter I. Cowling,et al.  Knowledge and Information Systems , 2006 .

[17]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[18]  Yiming Yang,et al.  An experimental study on large-scale web categorization , 2005, WWW '05.

[19]  S. T. Klein Combinatorial Representation of Generalized Fibonacci Numbers , 1991 .

[20]  Jae-Moon Lee,et al.  Managing Content with Automatic Document Classification , 2004, J. Digit. Inf..

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[23]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[25]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[26]  Valerie Guralnik,et al.  Parallel tree-projection-based sequence mining algorithms , 2004, Parallel Comput..

[27]  Osmar R. Zaïane,et al.  Considering Re-occurring Features in Associative Classifiers , 2005, PAKDD.

[28]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[29]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[30]  R. Mike Cameron-Jones,et al.  Induction of logic programs: FOIL and related systems , 1995, New Generation Computing.

[31]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[32]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[33]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[34]  Lukasz A. Kurgan,et al.  Multi-label associative classification of medical documents from MEDLINE , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[35]  Jian Pei,et al.  Pattern-growth methods for frequent pattern mining , 2002 .

[36]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[37]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[38]  Elena Baralis,et al.  Majority Classification by Means of Association Rules , 2003, PKDD.

[39]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[40]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[41]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[42]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.