Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches. It also shows that various rule learning heuristics used in CSM, EPM and SD algorithms all aim at optimizing a trade off between rule coverage and precision. The commonalities (and differences) between the approaches are showcased on a selection of best known variants of CSM, EPM and SD algorithms. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Peter A. Flach,et al.  Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned , 2004, Machine Learning.

[3]  Johannes Fürnkranz,et al.  An Analysis of Rule Evaluation Metrics , 2003, ICML.

[4]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[5]  Nada Lavrac,et al.  Contrast Set Mining for Distinguishing Between Similar Diseases , 2007, AIME.

[6]  Nada Lavrač,et al.  SUBGROUP VISUALIZATION , 2005 .

[7]  Gerhard Tutz,et al.  A CART-based approach to discover emerging patterns in microarray data , 2003, Bioinform..

[8]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[9]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[10]  Robert J. Hilderman,et al.  Exploratory Quantitative Contrast Set Mining: A Discretization Approach , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Stephen D. Bay Multivariate discretization of continuous variables for set mining , 2000, KDD '00.

[13]  Kotagiri Ramamohanarao,et al.  Further Improving Emerging Pattern Based Classifiers Via Bagging , 2006, PAKDD.

[14]  Willi Klösgen,et al.  Mining census data for spatial effects on mortality , 2003, Intell. Data Anal..

[15]  Einoshin Suzuki,et al.  Data Mining Methods for Discovering Interesting Exceptions from an Unsupervised Table , 2006, J. Univers. Comput. Sci..

[16]  Bruno Crémilleux,et al.  Condensed Representation of Emerging Patterns , 2004, PAKDD.

[17]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[18]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[19]  Eamonn J. Keogh,et al.  Group SAX: Extending the Notion of Contrast Sets to Time Series and Multimedia Data , 2006, PKDD.

[20]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[21]  Geoffrey I. Webb,et al.  Identifying markers of pathology in SAXS data of malignant tissues of the brain , 2005 .

[22]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[23]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[24]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[25]  María José del Jesús,et al.  Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A Case Study in Marketing , 2007, IEEE Transactions on Fuzzy Systems.

[26]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[27]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[28]  Kotagiri Ramamohanarao,et al.  Efficiently Mining Interesting Emerging Patterns , 2003, WAIM.

[29]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[30]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[31]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[32]  Frank Puppe,et al.  Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery , 2005, IJCAI.

[33]  Willi Klösgen,et al.  Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database , 2002, PKDD.

[34]  Wynne Hsu,et al.  Discovering the set of fundamental rule changes , 2001, KDD '01.

[35]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[36]  D. Wettschereck,et al.  Subgroup Visualization: A Method and Application in Population Screening , 2002 .

[37]  Frank Puppe,et al.  Semi-Automatic Visual Subgroup Mining using VIKAMINE , 2005, J. Univers. Comput. Sci..

[38]  Wynne Hsu,et al.  Mining Changes for Real-Life Applications , 2000, DaWaK.

[39]  Lemonia Ragia,et al.  Spatial Subgroup Discovery Applied to the Analysis of Vegetation Data , 2002, PAKM.

[40]  Tzu-Tsung Wong,et al.  Mining negative contrast sets from data with discrete attributes , 2005, Expert Syst. Appl..

[41]  Stefan Wrobel,et al.  Inductive Logic Programming for Knowledge Discovery in Databases , 2001 .

[42]  Soung Hie Kim,et al.  Mining the change of customer behavior in an internet shopping mall , 2001, Expert Syst. Appl..

[43]  Nada Lavrac,et al.  Contrast Set Mining Through Subgroup Discovery Applied to Brain Ischaemina Data , 2007, PAKDD.

[44]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[45]  K. K. W. Siua,et al.  Identifying markers of pathology in SAXS data of malignant tissues of the brain , 2005 .

[46]  Kotagiri Ramamohanarao,et al.  Instance-Based Classification by Emerging Patterns , 2000, PKDD.

[47]  Geoffrey I. Webb,et al.  On detecting differences between groups , 2003, KDD '03.

[48]  Geoffrey I. Webb Discovering associations with numeric variables , 2001, KDD '01.

[49]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[50]  N. Lavrac,et al.  Supporting Factors to Improve the Explanatory Potential of Contrast Set Mining: Analyzing Brain Ischaemia Data , 2007 .

[51]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[52]  Nada Lavrač,et al.  A Data Mining Experiment on Manufacturing Shop Floor Data , 2007 .

[53]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[54]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[55]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[56]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[57]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[58]  David Taniar,et al.  Exception Rules in Data Mining , 2005, Encyclopedia of Information Science and Technology.

[59]  Kotagiri Ramamohanarao,et al.  Making Use of the Most Expressive Jumping Emerging Patterns for Classification , 2000, Knowledge and Information Systems.

[60]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[61]  F. Puppe,et al.  Profiling Examiners using Intelligent Subgroup Mining , 2005 .

[62]  Jeffrey Xu Yu,et al.  Mining Changes of Classification by Correspondence Tracing , 2003, SDM.

[63]  Kotagiri Ramamohanarao,et al.  A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.