RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising

One practical inconvenience in frequent pattern mining is that it often yields a flood of common or uninformative patterns, and thus we should carefully adjust the minimum support. To alleviate this inconvenience, based on FP-growth, this paper proposes RP-growth, an efficient algorithm for top-k mining of discriminative patterns which are highly relevant to the class of interest. RP-growth conducts a branchand-bound search using anti-monotonic upper bounds of the relevance scores such as F-score and χ, and the pruning in branch-and-bound search is successfully translated to minimum support raising, a standard, easy-to-implement pruning strategy for top-k mining. Furthermore, by introducing the notion called weakness and an additional, aggressive pruning strategy based on weakness, RP-growth efficiently finds k patterns of wide variety and high relevance to the class of interest. Experimental results on text classification exhibit the efficiency and the usefulness of RP-growth.

[1]  瀬々 潤,et al.  Traversing Itemset Lattices with Statistical Metric Pruning (小特集 「発見科学」及び一般演題) , 2000 .

[2]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Luc De Raedt,et al.  Cluster-grouping: from subgroup discovery to clustering , 2004, Machine Learning.

[5]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[6]  Geoffrey I. Webb,et al.  K-Optimal Rule Discovery , 2005, Data Mining and Knowledge Discovery.

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Pawel Terlecki,et al.  Efficient Discovery of Top-K Minimal Jumping Emerging Patterns , 2008, RSCTC.

[9]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[10]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[11]  Kotagiri Ramamohanarao,et al.  A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[12]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[13]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[14]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[15]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[16]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[17]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[18]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[19]  Taisuke Sato,et al.  Verbal Characterization of Probabilistic Clusters Using Minimal Discriminative Propositions , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[20]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[21]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[22]  Kotagiri Ramamohanarao,et al.  Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Changhe Yuan,et al.  Most Relevant Explanation in Bayesian Networks , 2011, J. Artif. Intell. Res..

[24]  Manish Gupta,et al.  Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data , 2012, IEEE Transactions on Knowledge and Data Engineering.