Survey on Automated Machine Learning

Machine learning has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to automatically build machine learning applications without extensive knowledge of statistics and machine learning. In this survey, we summarize the recent developments in academy and industry regarding AutoML. First, we introduce a holistic problem formulation. Next, approaches for solving various subproblems of AutoML are presented. Finally, we provide an extensive empirical evaluation of the presented approaches on synthetic and real data.

[1]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[2]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[3]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[4]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[5]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[6]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[7]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[8]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[9]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[10]  Deepak S. Turaga,et al.  Learning Feature Engineering for Classification , 2017, IJCAI.

[11]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[12]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[13]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[14]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[15]  Steven M. LaValle,et al.  On the Relationship between Classical Grid Search and Probabilistic Roadmaps , 2004, Int. J. Robotics Res..

[16]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[17]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  An Experimental Study of the Combination of Meta-Learning with Particle Swarm Algorithms for SVM Parameter Selection , 2012, ICCSA.

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .

[21]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[22]  Alan J. Mayne,et al.  Towards Global Optimisation 2 , 1976 .

[23]  Bogdan Gabrys,et al.  Towards Automatic Composition of Multicomponent Predictive Systems , 2016, HAIS.

[24]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[25]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.

[26]  Shaul Markovitch,et al.  Feature Generation Using General Constructor Functions , 2002, Machine Learning.

[27]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[28]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[29]  Sanjay Krishnan,et al.  ActiveClean: Interactive Data Cleaning For Statistical Modeling , 2016, Proc. VLDB Endow..

[30]  François Laviolette,et al.  Sequential Model-Based Ensemble Optimization , 2014, UAI.

[31]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[32]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[33]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[34]  Eyke Hüllermeier,et al.  ML-Plan: Automated machine learning via hierarchical planning , 2018, Machine Learning.

[35]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[36]  F. Hutter,et al.  Practical Automated Machine Learning for the AutoML Challenge 2018 , 2018 .

[37]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[38]  Roger J.-B. Wets,et al.  Minimization by Random Search Techniques , 1981, Math. Oper. Res..

[39]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[40]  Rafael Martí,et al.  Experimental Testing of Advanced Scatter Search Designs for Global Optimization of Multimodal Functions , 2005, J. Glob. Optim..

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Bernd Bischl,et al.  OpenML Benchmarking Suites , 2017, 1708.03731.

[43]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[44]  Vikram Pudi,et al.  AutoLearn — Automated Feature Generation and Selection , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[45]  Christian Kirches,et al.  Mixed-integer nonlinear optimization*† , 2013, Acta Numerica.

[46]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.

[47]  Volker Märgner,et al.  A design of a preprocessing framework for large database of historical documents , 2011, HIP '11.

[48]  Christopher Ré,et al.  The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .

[49]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[50]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[51]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[52]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[53]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[54]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[55]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[56]  Aaron Klein,et al.  RoBO : A Flexible and Robust Bayesian Optimization Framework in Python , 2017 .

[57]  Steven Reece,et al.  Automated Machine Learning on Big Data using Stochastic Algorithm Tuning , 2014 .

[58]  Tim Kraska,et al.  Automating model search for large scale machine learning , 2015, SoCC.

[59]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[60]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[61]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[62]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[64]  Kevin D. Seppi,et al.  Preprocessor Selection for Machine Learning Pipelines , 2018, ArXiv.

[65]  Isabelle Guyon,et al.  Agnostic Learning vs. Prior Knowledge Challenge , 2007, 2007 International Joint Conference on Neural Networks.

[66]  Jan A Snyman,et al.  Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms , 2005 .

[67]  Shaul Markovitch,et al.  Recursive Feature Generation for Knowledge-based Learning , 2018, ArXiv.

[68]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[69]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[70]  Hugo Jair Escalante,et al.  Particle Swarm Model Selection for Authorship Verification , 2009, CIARP.

[71]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[72]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[73]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[74]  Andreas Dengel,et al.  Meta-learning for evolutionary parameter optimization of classifiers , 2012, Machine Learning.

[75]  Lars Schmidt-Thieme,et al.  Hyperparameter Search Space Pruning - A New Component for Sequential Model-Based Hyperparameter Optimization , 2015, ECML/PKDD.

[76]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[77]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[78]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[79]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[80]  Felix A. Fischer,et al.  On the Rate of Convergence of Fictitious Play , 2010, Theory of Computing Systems.

[81]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[82]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[83]  Kalyan Veeramachaneni,et al.  FeatureHub: Towards Collaborative Data Science , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[84]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[85]  Sherif Sakr,et al.  Automated Machine Learning: State-of-The-Art and Open Challenges , 2019, ArXiv.

[86]  Kirthevasan Kandasamy,et al.  Asynchronous Parallel Bayesian Optimisation via Thompson Sampling , 2017, ArXiv.

[87]  Konstantinos E. Parsopoulos,et al.  Particle Swarm Methods , 2018, Handbook of Heuristics.

[88]  Laura Gustafson Bayesian tuning and bandits : an extensible, open source library for AutoML , 2018 .

[89]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[90]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[91]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[92]  Arun Ross,et al.  ATM: A distributed, collaborative, scalable system for automated machine learning , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[93]  Gang Luo,et al.  A review of automatic selection methods for machine learning algorithms and hyper-parameter values , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[94]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[95]  Ameet Talwalkar,et al.  Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits , 2016, ArXiv.

[96]  Bernd Bischl,et al.  OpenML Benchmarking Suites and the OpenML100 , 2017, ArXiv.

[97]  Scott Langevin Distil : A Mixed-Initiative Model Discovery System for Subject Matter Experts ( Demo ) , 2018 .

[98]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[99]  Randal S. Olson,et al.  Automating Biomedical Data Science Through Tree-Based Pipeline Optimization , 2016, EvoApplications.

[100]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[101]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[102]  András György,et al.  LEAPSANDBOUNDS: A Method for Approximately Optimal Algorithm Configuration , 2018, ICML.

[103]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[104]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[105]  Oznur Alkan,et al.  One button machine for automating feature engineering in relational databases , 2017, ArXiv.

[106]  Gisele L. Pappa,et al.  RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines , 2017, EuroGP.

[107]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[108]  Tim Kraska,et al.  SampleClean: Fast and Reliable Analytics on Dirty Data , 2015, IEEE Data Eng. Bull..

[109]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[110]  Charles Sutton,et al.  Data Cleaning using Probabilistic Models of Integrity Constraints , 2016, NIPS 2016.

[111]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[112]  Luca Caucci,et al.  Maximum-Likelihood Estimation With a Contracting-Grid Search Algorithm , 2010, IEEE Transactions on Nuclear Science.

[113]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[114]  Frank Hutter,et al.  Towards Further Automation in AutoML , 2018 .

[115]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[116]  Bart De Moor,et al.  Easy Hyperparameter Search Using Optunity , 2014, ArXiv.

[117]  Isabelle Guyon,et al.  Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge , 2008, Neural Networks.

[118]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[119]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[120]  R. Haftka,et al.  Review of multi-fidelity models , 2016, Advances in Computational Science and Engineering.

[121]  Deepak S. Turaga,et al.  Feature Engineering for Predictive Modeling using Reinforcement Learning , 2017, AAAI.

[122]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[123]  Chris J. Hinde,et al.  Improving Genetic Algorithms' Efficiency Using Intelligent Fitness Functions , 2003, IEA/AIE.

[124]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[125]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[126]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[127]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[128]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[129]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[130]  Joaquin Vanschoren,et al.  Fast Algorithm Selection Using Learning Curves , 2015, IDA.

[131]  Dawn Xiaodong Song,et al.  ExploreKit: Automatic Feature Generation and Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[132]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[133]  R. L. Anderson,et al.  RECENT ADVANCES IN FINDING BEST OPERATING CONDITIONS , 1953 .

[134]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[135]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[136]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[137]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[138]  Eyke Hüllermeier,et al.  ML-Plan for Unlimited-Length Machine Learning Pipelines , 2018, ICML 2018.

[139]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[140]  Atri Rudra,et al.  Sparse Recovery for Orthogonal Polynomial Transforms , 2019, ICALP.

[141]  Khurana Udayan,et al.  Cognito: Automated Feature Engineering for Supervised Learning , 2016 .

[142]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[143]  Ihab F. Ilyas,et al.  Data Cleaning: Overview and Emerging Challenges , 2016, SIGMOD Conference.

[144]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[145]  B. Samanta,et al.  Gear fault detection using artificial neural networks and support vector machines with genetic algorithms , 2004 .

[146]  Ji Gao,et al.  Fast training Support Vector Machines using parallel sequential minimal optimization , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[147]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1998 .

[148]  Yolanda Gil,et al.  Towards human-guided machine learning , 2019, IUI.

[149]  Paolo Papotti,et al.  KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing , 2015, SIGMOD Conference.

[150]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[151]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[152]  Dimitris Margaritis Toward Provably Correct Feature Selection in Arbitrary Domains , 2009, NIPS.

[153]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[154]  Lars Schmidt-Thieme,et al.  Automatic Frankensteining: Creating Complex Ensembles Autonomously , 2017, SDM.

[155]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[156]  Sergio Escalera,et al.  Design of the 2015 ChaLearn AutoML challenge , 2015, IJCNN.

[157]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[158]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[159]  Isabelle Guyon,et al.  Taking Human out of Learning Applications: A Survey on Automated Machine Learning , 2018, 1810.13306.

[160]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[161]  Juliana Freire,et al.  AlphaD3M: Machine Learning Pipeline Synthesis , 2021, ArXiv.

[162]  German Rigau,et al.  IXA pipeline: Efficient and Ready to Use Multilingual NLP tools , 2014, LREC.

[163]  Marius Thomas Lindauer,et al.  Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates , 2017, ArXiv.

[164]  Joseph M. Hellerstein,et al.  Quantitative Data Cleaning for Large Databases , 2008 .

[165]  Hahn-Ming Lee,et al.  Model selection of SVMs using GA approach , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[166]  Yan Xu,et al.  Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning , 2018, KDD.

[167]  Carlos Soares,et al.  Bandit-Based Automated Machine Learning , 2018, 2018 7th Brazilian Conference on Intelligent Systems (BRACIS).

[168]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[169]  Alexander Allen,et al.  Benchmarking Automatic Machine Learning Frameworks , 2018, ArXiv.

[170]  Yolanda Gil,et al.  P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning , 2018 .

[171]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.