论文信息 - Conditional discriminative pattern mining: Concepts and algorithms

Conditional discriminative pattern mining: Concepts and algorithms

Discriminative pattern mining is used to discover a set of significant patterns that occur with disproportionate frequencies in different class-labeled data sets. Although there are many algorithms that have been proposed, the redundancy issue that the discriminative power of many patterns mainly derives from their sub-patterns has not been resolved yet. In this paper, we consider a novel notion dubbed conditional discriminative pattern to address this issue. To mine conditional discriminative patterns, we propose an effective algorithm called CDPM (Conditional Discriminative Patterns Mining) to generate a set of non-redundant discriminative patterns. Experimental results on real data sets demonstrate that CDPM has very good performance on removing redundant patterns that are derived from significant sub-patterns so as to generate a concise set of meaningful discriminative patterns.

[1] Geoffrey I. Webb. Discovering Significant Patterns , 2007, Machine Learning.

[2] Daniel Paurat,et al. An enhanced relevance criterion for more concise supervised pattern discovery , 2012, KDD.

[3] Hannu Toivonen,et al. Discovering statistically non-redundant subgroups , 2014, Knowl. Based Syst..

[4] Henrik Grosskreutz,et al. Non-redundant Subgroup Discovery Using a Closure System , 2009, ECML/PKDD.

[5] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[6] Jiawei Han,et al. Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7] María José del Jesús,et al. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans , 2015, Inf. Sci..

[8] Nada Lavrac,et al. Closed Sets for Labeled Data , 2008, J. Mach. Learn. Res..

[9] Nada Lavrac,et al. Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[10] Hongyan Liu,et al. A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences , 2014, INFORMS J. Comput..

[11] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12] Li Ma,et al. An “almost exhaustive” search‐based sequential permutation method for detecting epistasis in disease association studies , 2010, Genetic epidemiology.

[13] Alan Agresti,et al. Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[14] Daniel Paurat,et al. Direct local pattern sampling by efficient two-step random procedures , 2011, KDD.

[15] Matthias Jarke,et al. 20th VLDB Conference, September 12-15, 1994, Santiago-Chile : proceedings of the 20th International Conference on Very Large Data Bases , 1994 .

[16] Zengyou He,et al. Permutation methods for testing the significance of phosphorylation motifs , 2012 .

[17] Luc De Raedt,et al. Constraint-Based Pattern Set Mining , 2007, SDM.

[18] Jiawei Han,et al. CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[19] Johannes Fürnkranz,et al. From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[20] María José del Jesús,et al. An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[21] Jian Pei,et al. CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[23] Luc De Raedt,et al. k-Pattern Set Mining under Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24] Stefan Wrobel,et al. An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[25] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26] Taisuke Sato,et al. RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising , 2012, SDM.

[27] S. Gygi,et al. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets , 2005, Nature Biotechnology.

[28] Philip S. Yu,et al. Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29] Paulo J. Azevedo,et al. Rules for contrast sets , 2010, Intell. Data Anal..

[30] Arno J. Knobbe,et al. Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[31] Anthony K. H. Tung,et al. Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[32] Peter A. Flach,et al. Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[33] Nada Lavrac,et al. Relevancy in Constraint-Based Subgroup Discovery , 2004, Constraint-Based Mining and Inductive Databases.

[34] Pawel Terlecki,et al. Jumping emerging patterns with negation in transaction databases - Classification and discovery , 2007, Inf. Sci..

[35] Manish Gupta,et al. Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[36] Jinyan Li,et al. Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[37] James Bailey,et al. Discovery of Emerging Patterns and Their Use in Classification , 2003, Australian Conference on Artificial Intelligence.

[38] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39] P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[40] Schloss Birlinghoven,et al. Fast Discovery of Relevant Subgroups using a Reduced Search Space , 2010 .

[41] Jie Wang,et al. Discriminative pattern mining and its applications in bioinformatics , 2015, Briefings Bioinform..

[42] Jun Wu,et al. Mining Conditional Phosphorylation Motifs , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43] Stephen D. Bay,et al. Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[44] Chris Bailey-Kellogg,et al. MMFPh: a maximal motif finder for phosphoproteomics datasets , 2012, Bioinform..

[45] Stephen D. Bay,et al. Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[46] P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses (Springer Series in Statistics) , 1994 .

[47] Geoffrey I. Webb,et al. Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..