Evaluation Measures for Multi-class Subgroup Discovery

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. It has previously predominantly been investigated in a two-class context. This paper investigates multi-class subgroup discovery methods. We consider six evaluation measures for multi-class subgroups, four of them new, and study their theoretical properties. We extend the two-class subgroup discovery algorithm CN2-SD to incorporate the new evaluation measures and a new weighting scheme inspired by AdaBoost. We demonstrate the usefulness of multi-class subgroup discovery experimentally, using discovered subgroups as features for a decision tree learner. Not only is the number of leaves of the decision tree reduced with a factor between 8 and 16 on average, but significant improvements in accuracy and AUC are achieved with particular evaluation measures and settings. Similar performance improvements can be observed when using naive Bayes.

[1]  Yves Kodratoff,et al.  Machine Learning — EWSL-91 , 1991, Lecture Notes in Computer Science.

[2]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[3]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[4]  Henrik Boström,et al.  Covering vs. Divide-and-Conquer for Top-Down Induction of Logic Programs , 1995, IJCAI.

[5]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[6]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[9]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[10]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Mitsuru Ishizuka,et al.  PRICAI 2002: Trends in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[14]  Boonserm Kijsirikul,et al.  Adaptive Directed Acyclic Graphs for Multiclass Classification , 2002, PRICAI.

[15]  W. Klösgen Data mining tasks and methods: Subgroup discovery: deviation analysis , 2002 .

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  Jan M. Zytkow,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[18]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[19]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[20]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[21]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[22]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[23]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[24]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[25]  Ah-Hwee Tan,et al.  Data Mining for Biomedical Applications , 2006, Lecture Notes in Computer Science.

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..