论文信息 - RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising

RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising

One practical inconvenience in frequent pattern mining is that it often yields a flood of common or uninformative patterns, and thus we should carefully adjust the minimum support. To alleviate this inconvenience, based on FP-growth, this paper proposes RP-growth, an efficient algorithm for top-k mining of discriminative patterns which are highly relevant to the class of interest. RP-growth conducts a branchand-bound search using anti-monotonic upper bounds of the relevance scores such as F-score and χ, and the pruning in branch-and-bound search is successfully translated to minimum support raising, a standard, easy-to-implement pruning strategy for top-k mining. Furthermore, by introducing the notion called weakness and an additional, aggressive pruning strategy based on weakness, RP-growth efficiently finds k patterns of wide variety and high relevance to the class of interest. Experimental results on text classification exhibit the efficiency and the usefulness of RP-growth.

Taisuke Sato | Yoshitaka Kameya | Taisuke Sato | Yoshitaka Kameya

[1] 瀬々潤,et al. Traversing Itemset Lattices with Statistical Metric Pruning (小特集「発見科学」及び一般演題) , 2000 .

[2] Jiawei Han,et al. Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3] Jiawei Han,et al. Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4] Luc De Raedt,et al. Cluster-grouping: from subgroup discovery to clustering , 2004, Machine Learning.

[5] Dimitrios Gunopulos,et al. Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[6] Geoffrey I. Webb,et al. K-Optimal Rule Discovery , 2005, Data Mining and Knowledge Discovery.

[7] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8] Pawel Terlecki,et al. Efficient Discovery of Top-K Minimal Jumping Emerging Patterns , 2008, RSCTC.

[9] Stefan Wrobel,et al. An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[10] Shinichi Morishita,et al. On Classification and Regression , 1998, Discovery Science.

[11] Kotagiri Ramamohanarao,et al. A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[12] Mohak Shah,et al. Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[13] Stephen D. Bay,et al. Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[14] Shinichi Morishita,et al. Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[15] Geoffrey I. Webb,et al. Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[16] Lakhmi C. Jain,et al. Introduction to Bayesian Networks , 2008 .

[17] Howard J. Hamilton,et al. Interestingness measures for data mining: A survey , 2006, CSUR.

[18] Nada Lavrac,et al. Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[19] Taisuke Sato,et al. Verbal Characterization of Probabilistic Clusters Using Minimal Discriminative Propositions , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[20] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[21] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[22] Kotagiri Ramamohanarao,et al. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23] Changhe Yuan,et al. Most Relevant Explanation in Bayesian Networks , 2011, J. Artif. Intell. Res..

[24] Manish Gupta,et al. Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data , 2012, IEEE Transactions on Knowledge and Data Engineering.