First-Order Multi-class Subgroup Discovery

Subgroup discovery is concerned with finding subsets of a population whose class distribution is significantly different from the overall distribution. Previously subgroup discovery has been predominantly investigated under the propositional logic framework. This paper investigates multi-class subgroup discovery in an inductive logic programming setting, where subgroups are defined by conjunctions in first-order logic. We present a new weighted covering algorithm, inspired by the Aleph first-order rule learner, that uses seed examples in order to learn diverse, representative and highly predictive subgroups that capture interesting patterns across multiple classes. Our approach experimentally shows considerable and statistically significant improvement of predictive power, both in terms of accuracy and AUC, and theory construction time, by considering fewer hypotheses.

[1]  W. Klösgen Data mining tasks and methods: Subgroup discovery: deviation analysis , 2002 .

[2]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[3]  Boonserm Kijsirikul,et al.  Adaptive Directed Acyclic Graphs for Multiclass Classification , 2002, PRICAI.

[4]  Peter A. Flach,et al.  Learning Multi-class Theories in ILP , 2010, ILP.

[5]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[6]  Peter A. Flach,et al.  RSD: Relational Subgroup Discovery through First-Order Feature Construction , 2002, ILP.

[7]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[9]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[10]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[11]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[12]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[13]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[14]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[15]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[16]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[17]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Peter A. Flach,et al.  The Advantages of Seed Examples in First-Order Multi-class Subgroup Discovery , 2010, ECAI.

[20]  尾崎 知伸,et al.  Efficient Induction of Logic Programs Based on the Mode Analysis of Most Specific Hypothesis , 2001 .

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..