Fitness Landscape Analysis of Automated Machine Learning Search Spaces

The field of Automated Machine Learning (AutoML) has as its main goal to automate the process of creating complete Machine Learning (ML) pipelines to any dataset without requiring deep user expertise in ML. Several AutoML methods have been proposed so far, but there is not a single one that really stands out. Furthermore, there is a lack of studies on the characteristics of the fitness landscape of AutoML search spaces. Such analysis may help to understand the performance of different optimization methods for AutoML and how to improve them. This paper adapts classic fitness landscape analysis measures to the context of AutoML. This is a challenging task, as AutoML search spaces include discrete, continuous, categorical and conditional hyperparameters. We propose an ML pipeline representation, a neighborhood definition and a distance metric between pipelines, and use them in the evaluation of the fitness distance correlation (FDC) and the neutrality ratio for a given AutoML search space. Results of FDC are counter-intuitive and require a more in-depth analysis of a range of search spaces. Results of neutrality, in turn, show a strong positive correlation between the mean neutrality ratio and the fitness value.

[1]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[2]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[3]  Yuri Pirola,et al.  A study of the neutrality of Boolean function landscapes in genetic programming , 2012, Theor. Comput. Sci..

[4]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[5]  Andries Engelbrecht,et al.  Analysis of error landscapes in multi-layered neural networks for classification , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[6]  Christian M. Reidys,et al.  Neutrality in fitness landscapes , 2001, Appl. Math. Comput..

[7]  Andries Petrus Engelbrecht,et al.  Search space boundaries in neural network error landscape analysis , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[8]  C. Papadimitriou,et al.  Introduction to the Theory of Computation , 2018 .

[9]  Holger H. Hoos,et al.  Algorithm Configuration Landscapes: - More Benign Than Expected? , 2018, PPSN.

[10]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[11]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[12]  Anikó Ekárt,et al.  A Metric for Genetic Programs and Fitness Sharing , 2000, EuroGP.

[13]  Daniel Zwillinger,et al.  CRC standard mathematical tables and formulae; 30th edition , 1995 .

[14]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[15]  P. Stadler Fitness Landscapes , 1993 .

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Roberto Santana,et al.  Analysis of the Complexity of the Automatic Pipeline Generation Problem , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[18]  Peter A. Whigham,et al.  Grammar-based Genetic Programming: a survey , 2010, Genetic Programming and Evolvable Machines.

[19]  Michael Affenzeller,et al.  A Comprehensive Survey on Fitness Landscape Analysis , 2012, Recent Advances in Intelligent Engineering Systems.

[20]  Anna Sergeevna Bosman,et al.  Characterising neutrality in neural network error landscapes , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[21]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[22]  W. Beyer CRC Standard Mathematical Tables and Formulae , 1991 .

[23]  Marco F. Huber,et al.  Survey on Automated Machine Learning , 2019, ArXiv.

[24]  Andries Petrus Engelbrecht,et al.  Progressive gradient walk for neural network fitness landscape analysis , 2018, GECCO.

[25]  Andries Petrus Engelbrecht,et al.  Characterising the searchability of continuous optimisation problems for PSO , 2014, Swarm Intelligence.

[26]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.