Automatic Tuning of Rule-Based Evolutionary Machine Learning via Problem Structure Identification

The success of any machine learning technique depends on the correct setting of its parameters and, when it comes to large-scale datasets, hand-tuning these parameters becomes impractical. However, very large-datasets can be pre-processed in order to distil information that could help in appropriately setting various systems parameters. In turn, this makes sophisticated machine learning methods easier to use to end-users. Thus, by modelling the performance of machine learning algorithms as a function of the structure inherent in very large datasets one could, in principle, detect "hotspots" in the parameters' space and thus, auto-tune machine learning algorithms for better dataset-specific performance. In this work we present a parameter setting mechanism for a rule-based evolutionary machine learning system that is capable of finding the adequate parameter value for a wide variety of synthetic classification problems with binary attributes and with/without added noise. Moreover, in the final validation stage our automated mechanism is able to reduce the computational time of preliminary experiments up to 71% for a challenging real-world bioinformatics dataset.

[1]  Jim Smith,et al.  Adaptively Parameterised Evolutionary Systems: Self-Adaptive Recombination and Mutation in a Genetic Algorithm , 1996, PPSN.

[2]  Gilles Venturini,et al.  SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts , 1993, ECML.

[3]  Stewart W. Wilson Mining Oblique Data with XCS , 2000, IWLCS.

[4]  Jonathan M. Garibaldi,et al.  Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data , 2012, PloS one.

[5]  Michael Kearns,et al.  Computational complexity of machine learning , 1990, ACM distinguished dissertations.

[6]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[7]  Isaac L. Chuang,et al.  Confident Learning: Estimating Uncertainty in Dataset Labels , 2019, J. Artif. Intell. Res..

[8]  Jaume Bacardit,et al.  Analysis of mass spectrometry data from the secretome of an explant model of articular cartilage exposed to pro-inflammatory and anti-inflammatory stimuli using machine learning , 2013, BMC Musculoskeletal Disorders.

[9]  S.D. Muller,et al.  Step size adaptation in evolution strategies using reinforcement learning , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[10]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[11]  Jaume Bacardit,et al.  A mixed discrete-continuous attribute list representation for large scale classification domains , 2009, GECCO '09.

[12]  Elliot Meyerson,et al.  Evolutionary architecture search for deep multitask networks , 2018, GECCO.

[13]  Jaume Bacardit,et al.  Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets[C][W][OA] , 2011, Plant Cell.

[14]  Jaume Bacardit,et al.  Characterising the Influence of Rule-Based Knowledge Representations in Biological Knowledge Extraction from Transcriptomics Data , 2017, EvoApplications.

[15]  Jaume Bacardit,et al.  Analysing bioHEL using challenging boolean functions , 2010, GECCO '10.

[16]  Natalio Krasnogor,et al.  Self Generating Metaheuristics in Bioinformatics: The Proteins Structure Comparison Case , 2004, Genetic Programming and Evolvable Machines.

[17]  Mengjie Zhang,et al.  An automated ensemble learning framework using genetic programming for image classification , 2019, GECCO.

[18]  Stefan Edelkamp,et al.  Automated Planning: Theory and Practice , 2007, Künstliche Intell..

[19]  Bing Xue,et al.  Absumption to complement subsumption in learning classifier systems , 2019, GECCO.

[20]  Eyke Hüllermeier,et al.  ML-Plan: Automated machine learning via hierarchical planning , 2018, Machine Learning.

[21]  Jaume Bacardit Peñarroya Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time , 2004 .

[22]  Natalio Krasnogor,et al.  Emergence of profitable search strategies based on a simple inheritance mechanism , 2001 .

[23]  Randal S. Olson,et al.  Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming , 2017, GECCO.

[24]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[25]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[26]  Lawrence Davis,et al.  Adapting Operator Probabilities in Genetic Algorithms , 1989, ICGA.

[27]  Michèle Sebag,et al.  Adaptive operator selection with dynamic multi-armed bandits , 2008, GECCO '08.

[28]  Tin Kam Ho,et al.  Domain of competence of XCS classifier system in complexity measurement space , 2005, IEEE Transactions on Evolutionary Computation.

[29]  Francisco Herrera,et al.  Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study , 2010, IEEE Transactions on Evolutionary Computation.

[30]  Jaume Bacardit,et al.  Prediction of recursive convex hull class assignments for protein residues , 2008, Bioinform..

[31]  Will N. Browne,et al.  Theoretical adaptation of multiple rule-generation in XCS , 2018, GECCO.

[32]  Jaume Bacardit,et al.  Post-processing operators for decision lists , 2012, GECCO '12.

[33]  Alfonso Valencia,et al.  Automated Alphabet Reduction for Protein Datasets , 2009, BMC Bioinformatics.

[34]  Natalio Krasnogor,et al.  A Study on the use of ``self-generation'' in memetic algorithms , 2004, Natural Computing.

[35]  Kerstin Eder,et al.  XCS cannot learn all boolean functions , 2011, GECCO '11.

[36]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[37]  Zbigniew Michalewicz,et al.  Parameter Control in Evolutionary Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[38]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Francisco Herrera,et al.  Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling , 2011, Soft Comput..

[40]  Martin V. Butz,et al.  Rule-Based Evolutionary Online Learning Systems - A Principled Approach to LCS Analysis and Design , 2006, Studies in Fuzziness and Soft Computing.

[41]  Jim Smith,et al.  Self adaptation of mutation rates in a steady state genetic algorithm , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[42]  Jaume Bacardit,et al.  Modelling the initialisation stage of the ALKR representation for discrete domains and GABIL encoding , 2011, GECCO '11.

[43]  Martin V. Butz,et al.  Studying XCS/BOA learning in Boolean functions: structure encoding and random Boolean functions , 2006, GECCO '06.

[44]  Federico Divina,et al.  Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features , 2012, Bioinform..

[45]  Martin V. Butz,et al.  Self-adaptive mutation in XCSF , 2008, GECCO '08.

[46]  María Auxiliadora Franco Gaviria Principled design of evolutionary learning systems for large scale data mining , 2013 .

[47]  Ester Bernadó-Mansilla,et al.  Genetic-based machine learning systems are competitive for pattern recognition , 2008, Evol. Intell..

[48]  William M. Spears,et al.  Adapting Crossover in Evolutionary Algorithms , 1995, Evolutionary Programming.

[49]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[50]  D. Stekel,et al.  A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data , 2015, BMC Genomics.

[51]  Jim Smith,et al.  A Memetic Algorithm With Self-Adaptive Local Search: TSP as a case study , 2000, GECCO.

[52]  Yoshitaka Sakurai,et al.  A Method to Control Parameters of Evolutionary Algorithms by Using Reinforcement Learning , 2010, 2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems.

[53]  Alicia Troncoso Lora,et al.  Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets , 2015, Integr. Comput. Aided Eng..

[54]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[55]  Randal S. Olson,et al.  Automating Biomedical Data Science Through Tree-Based Pipeline Optimization , 2016, EvoApplications.

[56]  Will N. Browne,et al.  Theoretical XCS parameter settings of learning accurate classifiers , 2017, GECCO.

[57]  Kenneth A. De Jong,et al.  Learning Concept Classification Rules Using Genetic Algorithms , 1991, IJCAI.

[58]  Martijn C. Schut,et al.  Reinforcement Learning for Online Control of Evolutionary Algorithms , 2006, ESOA.

[59]  Larry Bull,et al.  A Self-Adaptive XCS , 2001, IWLCS.

[60]  Aluizio F. R. Araújo,et al.  Improving NSGA-II with an adaptive mutation operator , 2009, GECCO '09.

[61]  Larry Bull,et al.  Self-Adaptive Mutation in ZCS Controllers , 2000, EvoWorkshops.

[62]  Xavier Llorà,et al.  How XCS deals with rarities in domains with continuous attributes , 2010, GECCO '10.