A Novel Evolutionary Algorithm for Automated Machine Learning Focusing on Classifier Ensembles

Automated Machine Learning (Auto-ML) is an emerging area of ML which consists of automatically selecting the best ML algorithm and its best hyper-parameter settings for a given input dataset, by doing a search in a large space of candidate algorithms and settings. In this work we propose a new Evolutionary Algorithm (EA) for the Auto-ML task of automatically selecting the best ensemble of classifiers and their hyper-parameter settings for an input dataset. The proposed EA was compared against a version of the well-known Auto-WEKA method adapted to search in the same space of algorithms and hyper-parameter settings as the EA. In general, the EA obtained significantly smaller classification error rates than that Auto-WEKA version in experiments with 15 classification datasets.

[1]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[2]  Randal S. Olson,et al.  Automating Biomedical Data Science Through Tree-Based Pipeline Optimization , 2016, EvoApplications.

[3]  Christian Gagné,et al.  Bayesian Hyperparameter Optimization for Ensemble Learning , 2016, UAI.

[4]  Erick Cantú-Paz,et al.  Feature Subset Selection by Estimation of Distribution Algorithms , 2002, GECCO.

[5]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[6]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[7]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[8]  Vaidyanathan K. Jayaraman,et al.  Hybrid feature selection and peptide binding affinity prediction using an EDA based algorithm , 2013, 2013 IEEE Congress on Evolutionary Computation.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[11]  Alex Alves Freitas,et al.  Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming , 2018, PPSN.

[12]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[13]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[14]  Pavel Kordík,et al.  Discovering predictive ensembles for transfer learning and meta-learning , 2017, Machine Learning.

[15]  Gisele L. Pappa,et al.  RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines , 2017, EuroGP.

[16]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[17]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[18]  Xia Yang,et al.  Naive Bayes Based on Estimation of Distribution Algorithms for Classification , 2009, 2009 First International Conference on Information Science and Engineering.

[19]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[20]  Lars Schmidt-Thieme,et al.  Automatic Frankensteining: Creating Complex Ensembles Autonomously , 2017, SDM.

[21]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[22]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[23]  A. E. Eiben,et al.  Introduction to Evolutionary Computing 2nd Edition , 2020 .

[24]  François Laviolette,et al.  Sequential Model-Based Ensemble Optimization , 2014, UAI.

[25]  Aurora Trinidad Ramirez Pozo,et al.  Not all PBILs are the same: Unveiling the different learning mechanisms of PBIL variants , 2017, Appl. Soft Comput..