Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures

Emerging patterns (EPs) are associations of features whose frequencies increase significantly from one class to another. They have been proven useful to build powerful classifiers and to help establishing diagnosis. Because of the huge search space, mining and representing EPs is a hard and complex task for large datasets. Thanks to the use of recent results on condensed representations of frequent closed patterns, we propose here an exact condensed representation of EPs (i.e., all EPs and their growth rates). From this condensed representation, we give a method to provide interesting EPs, in fact those with the highest growth rates. We call strong emerging patterns (SEPs) these EPs. We also highlight a property characterizing the jumping emerging patterns. Experiments quantify the interests of SEPs (smaller number, ability to extract longer and less frequent patterns) and show their usefulness (in collaboration with the Philips company, SEPs successfully enabled to identify the failures of a production chain of silicon plates). These concepts of condensed representation and “strong patterns” with respect to a measure are generalized to other interestingness measures based on frequencies.

[1]  Kotagiri Ramamohanarao,et al.  The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms , 2000, ICML.

[2]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[4]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.

[5]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[6]  Vipin Kumar,et al.  Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[7]  Paul R. Cohen,et al.  Very Predictive Ngrams for Space-Limited Probabilistic Models , 2003, IDA.

[8]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[9]  R. Bone Discovery , 1938, Nature.

[10]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[11]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[12]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[13]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[14]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[15]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[16]  Toon Calders,et al.  Minimal k-Free Representations of Frequent Sets , 2003, PKDD.

[17]  Bruno Crémilleux,et al.  Condensed Representations in Presence of Missing Values , 2003, IDA.

[18]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[19]  Jean-François Boulicaut,et al.  Simplest Rules Characterizing Classes Generated by δ-Free Sets , 2003 .

[20]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[21]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[22]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[25]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[26]  L. Wong,et al.  Emerging patterns and gene expression data. , 2001, Genome informatics. International Conference on Genome Informatics.

[27]  Hendrik Blockeel,et al.  Knowledge Discovery in Databases: PKDD 2003 , 2003, Lecture Notes in Computer Science.

[28]  Bruno Crémilleux,et al.  Condensed Representation of Emerging Patterns , 2004, PAKDD.

[29]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[30]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[31]  James Bailey,et al.  Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[32]  Kotagiri Ramamohanarao,et al.  Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets , 2000, KDD '00.