Relevancy in Constraint-Based Subgroup Discovery

This chapter investigates subgroup discovery as a task of constraint-based mining of local patterns, aimed at describing groups of individuals with unusual distributional characteristics with respect to the property of interest. The chapter provides a novel interpretation of relevancy constraints and their use for feature filtering, introduces relevancy-based mechanisms for handling unknown values in the examples, and discusses the concept of relevancy as an approach to avoiding overfitting in subgroup discovery. The proposed approach to constraint-based subgroup mining, using the SD algorithm, was successfully applied to gene expression data analysis in functional genomics.

[1]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[2]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[3]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[4]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[5]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[6]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[7]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[8]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[11]  Luc De Raedt,et al.  Machine Learning: ECML-94 , 1994, Lecture Notes in Computer Science.

[12]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[13]  Nada Lavrac,et al.  A Relevancy Filter for Constructive Induction , 1998, IEEE Intell. Syst..

[14]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[15]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[17]  Alberto L. Sangiovanni-Vincentelli,et al.  Constructive Induction Using a Non-Greedy Strategy for Feature Selection , 1992, ML.

[18]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Jinyan Li,et al.  Geography of Differences between Two Classes of Data , 2002, PKDD.

[20]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[21]  Nada Lavrac,et al.  A Study of Relevance for Learning in Deductive Databases , 1999, J. Log. Program..

[22]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[23]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[24]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[25]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[26]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[27]  Bhavani M. Thuraisingham,et al.  Data mining, national security, privacy and civil liberties , 2002, SKDD.

[28]  Frantisek Franek,et al.  Comparison of Various Routines for Unknown Attribute Value Processing The Covering Paradigm , 1996, Int. J. Pattern Recognit. Artif. Intell..

[29]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[30]  瀬々 潤,et al.  Traversing Itemset Lattices with Statistical Metric Pruning (小特集 「発見科学」及び一般演題) , 2000 .