A New Method to Evaluate Subgroup Discovery Algorithms

A Subgroup Discovery algorithms is usually considered better than other method if the average of all its mined subgroups is higher, with respect to some predefined quality measures. This process has some drawbacks: it ignores the redundancy in mined patterns and it might hide important differences among algorithms that return subgroup sets with the same averaged value. In this paper, we propose a new method to evaluate and compare subgroup discovery algorithms. This method starts by removing redundancy using a novel procedure based on the examples covered by the patterns and the statistical redundancy between them. Then, a new similarity and quality methods is used to compared the algorithms based on their ability to detect the patterns and the quality of the mined patterns, respectively. The experimental results obtained show some interesting results that would be unnoticed by the traditional approach.

[1]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  María José del Jesús,et al.  NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery , 2010, IEEE Transactions on Fuzzy Systems.

[4]  Jesús Ariel Carrasco-Ochoa,et al.  Evaluation of quality measures for contrast patterns by using unseen objects , 2017, Expert Syst. Appl..

[5]  María José del Jesús,et al.  Multiobjective Genetic Algorithm for Extracting Subgroup Discovery Fuzzy Rules , 2007, 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making.

[6]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[7]  Jinyan Li,et al.  Relative risk and odds ratio: a data mining perspective , 2005, PODS '05.

[8]  Francisco Herrera,et al.  NICGAR: A Niching Genetic Algorithm to mine a diverse set of interesting quantitative association rules , 2016, Inf. Sci..

[9]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[10]  Willi Klösgen,et al.  Knowledge Discovery in Databases and Data Mining , 1996, ISMIS.

[11]  Stefan Rüping,et al.  On subgroup discovery in numerical domains , 2009, Data Mining and Knowledge Discovery.

[12]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[13]  Henrik Grosskreutz,et al.  Non-redundant Subgroup Discovery Using a Closure System , 2009, ECML/PKDD.

[14]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Hannu Toivonen,et al.  Discovering statistically non-redundant subgroups , 2014, Knowl. Based Syst..

[16]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .