Conditional discriminative pattern mining: Concepts and algorithms

Discriminative pattern mining is used to discover a set of significant patterns that occur with disproportionate frequencies in different class-labeled data sets. Although there are many algorithms that have been proposed, the redundancy issue that the discriminative power of many patterns mainly derives from their sub-patterns has not been resolved yet. In this paper, we consider a novel notion dubbed conditional discriminative pattern to address this issue. To mine conditional discriminative patterns, we propose an effective algorithm called CDPM (Conditional Discriminative Patterns Mining) to generate a set of non-redundant discriminative patterns. Experimental results on real data sets demonstrate that CDPM has very good performance on removing redundant patterns that are derived from significant sub-patterns so as to generate a concise set of meaningful discriminative patterns.

[1]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[2]  Daniel Paurat,et al.  An enhanced relevance criterion for more concise supervised pattern discovery , 2012, KDD.

[3]  Hannu Toivonen,et al.  Discovering statistically non-redundant subgroups , 2014, Knowl. Based Syst..

[4]  Henrik Grosskreutz,et al.  Non-redundant Subgroup Discovery Using a Closure System , 2009, ECML/PKDD.

[5]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[6]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  María José del Jesús,et al.  A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans , 2015, Inf. Sci..

[8]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2008, J. Mach. Learn. Res..

[9]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[10]  Hongyan Liu,et al.  A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences , 2014, INFORMS J. Comput..

[11]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Li Ma,et al.  An “almost exhaustive” search‐based sequential permutation method for detecting epistasis in disease association studies , 2010, Genetic epidemiology.

[13]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[14]  Daniel Paurat,et al.  Direct local pattern sampling by efficient two-step random procedures , 2011, KDD.

[15]  Matthias Jarke,et al.  20th VLDB Conference, September 12-15, 1994, Santiago-Chile : proceedings of the 20th International Conference on Very Large Data Bases , 1994 .

[16]  Zengyou He,et al.  Permutation methods for testing the significance of phosphorylation motifs , 2012 .

[17]  Luc De Raedt,et al.  Constraint-Based Pattern Set Mining , 2007, SDM.

[18]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[19]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[20]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[21]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[23]  Luc De Raedt,et al.  k-Pattern Set Mining under Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Taisuke Sato,et al.  RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising , 2012, SDM.

[27]  S. Gygi,et al.  An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets , 2005, Nature Biotechnology.

[28]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Paulo J. Azevedo,et al.  Rules for contrast sets , 2010, Intell. Data Anal..

[30]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[31]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[32]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[33]  Nada Lavrac,et al.  Relevancy in Constraint-Based Subgroup Discovery , 2004, Constraint-Based Mining and Inductive Databases.

[34]  Pawel Terlecki,et al.  Jumping emerging patterns with negation in transaction databases - Classification and discovery , 2007, Inf. Sci..

[35]  Manish Gupta,et al.  Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[36]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[37]  James Bailey,et al.  Discovery of Emerging Patterns and Their Use in Classification , 2003, Australian Conference on Artificial Intelligence.

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[40]  Schloss Birlinghoven,et al.  Fast Discovery of Relevant Subgroups using a Reduced Search Space , 2010 .

[41]  Jie Wang,et al.  Discriminative pattern mining and its applications in bioinformatics , 2015, Briefings Bioinform..

[42]  Jun Wu,et al.  Mining Conditional Phosphorylation Motifs , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[44]  Chris Bailey-Kellogg,et al.  MMFPh: a maximal motif finder for phosphoproteomics datasets , 2012, Bioinform..

[45]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[46]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses (Springer Series in Statistics) , 1994 .

[47]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..