On the Use of Genetic Programming for Mining Comprehensible Rules in Subgroup Discovery

This paper proposes a novel grammar-guided genetic programming algorithm for subgroup discovery. This algorithm, called comprehensible grammar-based algorithm for subgroup discovery (CGBA-SD), combines the requirements of discovering comprehensible rules with the ability to mine expressive and flexible solutions owing to the use of a context-free grammar. Each rule is represented as a derivation tree that shows a solution described using the language denoted by the grammar. The algorithm includes mechanisms to adapt the diversity of the population by self-adapting the probabilities of recombination and mutation. We compare the approach with existing evolutionary and classic subgroup discovery algorithms. CGBA-SD appears to be a very promising algorithm that discovers comprehensible subgroups and behaves better than other algorithms as measures by complexity, interest, and precision indicate. The results obtained were validated by means of a series of nonparametric tests.

[1]  César Hervás-Martínez,et al.  JCLEC: a Java framework for evolutionary computation , 2007, Soft Comput..

[2]  Sebastián Ventura,et al.  Discovering Subgroups by Means of Genetic Programming , 2013, EuroGP.

[3]  María José del Jesús,et al.  NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery , 2010, IEEE Transactions on Fuzzy Systems.

[4]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[5]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[8]  Francisco Herrera,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization , 2009, J. Heuristics.

[9]  Xin Yao,et al.  Multiobjective genetic programming for maximizing ROC performance , 2014, Neurocomputing.

[10]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[11]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[12]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[13]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[14]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[15]  Fernando Alonso,et al.  GGGP-based method for modeling time series: operator selection, parameter optimization and expert evaluation , 2010, GECCO '10.

[16]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[17]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[18]  Sebastián Ventura,et al.  Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules , 2011, Knowledge and Information Systems.

[19]  L. A. ZADEH,et al.  The concept of a linguistic variable and its application to approximate reasoning - I , 1975, Inf. Sci..

[20]  María José del Jesús,et al.  Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A Case Study in Marketing , 2007, IEEE Transactions on Fuzzy Systems.

[21]  Sebastián Ventura,et al.  Using Ant Programming Guided by Grammar for Building Rule-Based Classifiers , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[23]  Georgios Dounias,et al.  Evolving rule-based systems in two medical domains using genetic programming , 2004, Artif. Intell. Medicine.

[24]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[25]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[26]  Aravind Seshadri,et al.  A FAST ELITIST MULTIOBJECTIVE GENETIC ALGORITHM: NSGA-II , 2000 .

[27]  Wouter Duivesteijn,et al.  Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[28]  Nada Lavrac,et al.  Classification Rule Learning with APRIORI-C , 2001, EPIA.

[29]  María José del Jesús,et al.  Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department , 2011, Soft Comput..

[30]  Peter A. Whigham,et al.  Grammar-based Genetic Programming: a survey , 2010, Genetic Programming and Evolvable Machines.

[31]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[32]  Sebastián Ventura,et al.  RM-Tool: A framework for discovering and evaluating association rules , 2011, Adv. Eng. Softw..

[33]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[34]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[35]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[36]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.