Summarizing Contrasts by Recursive Pattern Mining

A lot of constrained patterns (e.g., emerging patterns, subgroup discovery, classification rules) emphasize the contrasts between data classes and are at the core of many classification techniques. Nevertheless, the extremely large collection of generated patterns hampers the end-user interpretation and the deep understanding of the knowledge revealed by the whole collection of patterns. The key idea of this paper is to summarize the contrasts of a dataset in order to provide understandable characterizations of data classes. We first introduce a novel framework, called recursive pattern mining, for only discovering few as well as relevant patterns. We demonstrate that this approach encompasses usual pattern mining framework and we study its key properties. Then, we use recursive pattern mining for extracting k recursive emerging patterns. Taken together, these patterns form a REP k-summary which summarizes the contrasts of the dataset. Finally, we validate our approach on benchmarks and real-world applications on the biological domain, showing the efficiency and the usefulness of the approach.

[1]  Heikki Mannila,et al.  The Pattern Ordering Problem , 2003, PKDD.

[2]  K. Esser,et al.  GPx-1 modulates Akt and P70S6K phosphorylation and Gadd45 levels in MCF-7 cells. , 2004, Free radical biology & medicine.

[3]  Luc De Raedt,et al.  Constraint-Based Pattern Set Mining , 2007, SDM.

[4]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[5]  Arno J. Knobbe,et al.  Pattern Teams , 2006, PKDD.

[6]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[7]  Bruno Crémilleux,et al.  Discovering Knowledge from Local Patterns in SAGE Data , 2009 .

[8]  Stefano Bistarelli,et al.  Soft constraint based pattern mining , 2007, Data Knowl. Eng..

[9]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[10]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[11]  Bruno Crémilleux,et al.  Adequate Condensed Representations of Patterns , 2008, ECML/PKDD.

[12]  Srinivasan Parthasarathy,et al.  Summarizing itemset patterns using probabilistic models , 2006, KDD '06.

[13]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[14]  James Bailey,et al.  Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[15]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[16]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[17]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[18]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2008, J. Mach. Learn. Res..

[19]  A. J. Feelders,et al.  Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach , 2010, 2010 IEEE International Conference on Data Mining.

[20]  Boleslaw K. Szymanski,et al.  Recursive data mining for masquerade detection and author identification , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[21]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[22]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[23]  Jian Tang,et al.  Mining N-most Interesting Itemsets , 2000, ISMIS.

[24]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[25]  Marc Plantevit,et al.  Sequential Patterns to Discover and Characterise Biological Relations , 2010, CICLing.