Inverted Heuristics in Subgroup Discovery

In rule learning, rules are typically induced in two phases, rule refinement and rule selection. It was recently argued that the usage of two separate heuristics for each phase—in particular using the so-called inverted heuristic in the refinement phase—produces longer rules with comparable classification accuracy. In this paper we test the utility of inverted heuristics in the context of subgroup discovery. For this purpose we developed a DoubleBeam subgroup discovery algorithm that allows for combining various heuristics for rule refinement and selection. The algorithm was experimentally evaluated on 20 UCI datasets using 10-fold double-loop cross validation. The experimental results suggest that a variant of the DoubleBeam algorithm using a specific combination of refinement and selection heuristics generates longer rules without compromising rule quality. However, the DoubleBeam algorithm using inverted heuristics does not outperform the standard CN2-SD and SD algorithms.

[1]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[3]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[4]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[5]  Peter A. Flach,et al.  RSD: Relational Subgroup Discovery through First-Order Feature Construction , 2002, ILP.

[6]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[7]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[8]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[9]  N. Lavra,et al.  EXPERIMENTAL COMPARISON OF THREE SUBGROUP DISCOVERY ALGORITHMS: ANALYSING BRAIN ISCHAEMIA DATA , 2005 .

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[12]  Stefan Rüping,et al.  On subgroup discovery in numerical domains , 2009, Data Mining and Knowledge Discovery.

[13]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[14]  Nada Lavrac,et al.  ClowdFlows: A Cloud Based Scientific Workflow Platform , 2012, ECML/PKDD.

[15]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[16]  Johannes Fürnkranz,et al.  Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning , 2014, ECML/PKDD.