Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.

[1]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[2]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[5]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[8]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[9]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[10]  Yuan Li,et al.  Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Norbert Jankowski,et al.  Complexity Measures for Meta-learning and Their Optimality , 2011, Algorithmic Probability and Friends.

[13]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[14]  András György,et al.  Efficient Multi-Start Strategies for Local Search Algorithms , 2009, J. Artif. Intell. Res..

[15]  Andreas Dengel,et al.  Meta-learning for evolutionary parameter optimization of classifiers , 2012, Machine Learning.

[16]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[17]  J. R. Bjerklie The end of work—The decline of the global labor force and the dawn of the post-market era, , 1996 .

[18]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[19]  Ralf Klinkenberg Meta-Learning, Model Selection, and Example Selection in Machine Learning Domains with Concept Drift , 2005, LWA.

[20]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[21]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[22]  Vikram Pudi,et al.  AutoLearn — Automated Feature Generation and Selection , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[23]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[24]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[25]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[26]  Lars Schmidt-Thieme,et al.  Hyperparameter Search Space Pruning - A New Component for Sequential Model-Based Hyperparameter Optimization , 2015, ECML/PKDD.

[27]  Iain Paterson,et al.  Evaluation of Machine-Learning Algorithm Ranking Advisors , 2000 .

[28]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[29]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[30]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[31]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[32]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[33]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[34]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[36]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[37]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[38]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[39]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Meta-Learning for Periodic Algorithm Selection in Time-Changing Data , 2012, 2012 Brazilian Symposium on Neural Networks.

[40]  Benjamin C. Kuo,et al.  AUTOMATIC CONTROL SYSTEMS , 1962, Universum:Technical sciences.

[41]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[42]  Joaquin Vanschoren Understanding Machine Learning Performance with Experiment Databases (Het verwerven van inzichten in leerperformantie met experiment databanken) ; Understanding Machine Learning Performance with Experiment Databases , 2010 .

[43]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[44]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[45]  Wei-Wei Tu,et al.  Towards Automated Semi-Supervised Learning , 2019, AAAI.

[46]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[47]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[48]  X. Yao Evolving Artificial Neural Networks , 1999 .

[49]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[50]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[51]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[52]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[53]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[54]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[55]  Xi Li,et al.  GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning , 2018, ACM Multimedia.

[56]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[59]  Bernd Bischl,et al.  ASlib: A benchmark library for algorithm selection , 2015, Artif. Intell..

[60]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[61]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[62]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[63]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[64]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[66]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[67]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[68]  Junjie Yan,et al.  Peephole: Predicting Network Performance Before Training , 2017, ArXiv.

[69]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[72]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[73]  Quoc V. Le,et al.  Faster Discovery of Neural Architectures by Searching for Paths in a Large Model , 2018, ICLR.

[74]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[75]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[76]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  An Experimental Study of the Combination of Meta-Learning with Particle Swarm Algorithms for SVM Parameter Selection , 2012, ICCSA.

[77]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[78]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[79]  Yu Li,et al.  Particle swarm optimisation for evolving artificial neural network , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[80]  Qingquan Song,et al.  Efficient Neural Architecture Search with Network Morphism , 2018, ArXiv.

[81]  Oznur Alkan,et al.  One button machine for automating feature engineering in relational databases , 2017, ArXiv.

[82]  Neil Houlsby,et al.  Transfer Automatic Machine Learning , 2018, ArXiv.

[83]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[84]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[85]  So Young Sohn,et al.  Meta Analysis of Classification Algorithms for Pattern Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[87]  Christophe G. Giraud-Carrier Metalearning - A Tutorial , 2008 .

[88]  Aaron Klein,et al.  Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.

[89]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[90]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[91]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[92]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[93]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[94]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[95]  Qiang Yang,et al.  Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion , 2019, AAAI.

[96]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[97]  Neil Houlsby,et al.  Transfer Learning with Neural AutoML , 2018, NeurIPS.

[98]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[99]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[100]  José Ranilla,et al.  Particle swarm optimization for hyper-parameter selection in deep neural networks , 2017, GECCO.

[101]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[102]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[103]  Steven Reece,et al.  Automated Machine Learning on Big Data using Stochastic Algorithm Tuning , 2014 .

[104]  Tim Kraska,et al.  Automating model search for large scale machine learning , 2015, SoCC.

[105]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[106]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[107]  Stéphane Pateux,et al.  Efficient Progressive Neural Architecture Search , 2018, BMVC.

[108]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.

[109]  Charles L. Phillips,et al.  Feedback Control Systems , 1988 .

[110]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[111]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[112]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data , 2014, Neurocomputing.

[113]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[114]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[115]  Frank Hutter,et al.  Simple And Efficient Architecture Search for Convolutional Neural Networks , 2017, ICLR.

[116]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[117]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[118]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[119]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[120]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[121]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[123]  Yang Yu,et al.  Sequential Classification-Based Optimization for Direct Policy Search , 2017, AAAI.

[124]  Frank Hutter,et al.  Neural Architecture Search , 2019, Automated Machine Learning.

[125]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[126]  Quoc V. Le,et al.  Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[127]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[128]  John C. Duchi,et al.  Derivative Free Optimization Via Repeated Classification , 2018, AISTATS.

[129]  Gerhard Widmer,et al.  Tracking Context Changes through Meta-Learning , 1997, Machine Learning.

[130]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[131]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[132]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[133]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[134]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[135]  Hugo Jair Escalante,et al.  Particle Swarm Model Selection , 2009, J. Mach. Learn. Res..

[136]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[137]  Geoffrey J. Gordon,et al.  DeepArchitect: Automatically Designing and Training Deep Architectures , 2017, ArXiv.

[138]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[139]  A. Doucet,et al.  Sequential MCMC for Bayesian model selection , 1999, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99.

[140]  Yuri Malitsky,et al.  Algorithm Selection and Scheduling , 2011, CP.

[141]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[142]  Yang Yu,et al.  Derivative-Free Optimization via Classification , 2016, AAAI.

[143]  Lars Schmidt-Thieme,et al.  Hyperparameter Optimization Machines , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[144]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[145]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[146]  Andreas Dengel,et al.  Prediction of Classifier Training Time Including Parameter Optimization , 2011, KI.

[147]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[148]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[149]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[150]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[151]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[152]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[153]  Vincent Calcagno,et al.  glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models , 2010 .

[154]  Kalyan Veeramachaneni,et al.  FeatureHub: Towards Collaborative Data Science , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[155]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[156]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[157]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[158]  Mark W. Schmidt,et al.  Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.

[159]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[160]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[161]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[162]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[163]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[164]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.

[165]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[166]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[167]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[168]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[169]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[170]  Melanie Hilario,et al.  Model selection via meta-learning: a comparative study , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[171]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[172]  Pavel Brazdil,et al.  Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Metalearning , 2010, ECAI.

[173]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[174]  Bogdan Gabrys,et al.  Automatic Composition and Optimization of Multicomponent Predictive Systems With an Extended Auto-WEKA , 2016, IEEE Transactions on Automation Science and Engineering.

[175]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[176]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[177]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[178]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[179]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[180]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[181]  Massimo Guarnieri The Roots of Automation Before Mechatronics [Historical] , 2010, IEEE Industrial Electronics Magazine.

[182]  Paolo Frasconi,et al.  Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.

[183]  Joaquin Vanschoren,et al.  Fast Algorithm Selection Using Learning Curves , 2015, IDA.

[184]  Dawn Xiaodong Song,et al.  ExploreKit: Automatic Feature Generation and Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[185]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[186]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[187]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[188]  Song Han,et al.  Path-Level Network Transformation for Efficient Architecture Search , 2018, ICML.

[189]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[190]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[191]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[192]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[193]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[194]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[195]  Geoffrey Zweig,et al.  Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[196]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[197]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[198]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[199]  Norbert Jankowski,et al.  Universal Meta-Learning Architecture and Algorithms , 2011, Meta-Learning in Computational Intelligence.

[200]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[201]  Deepak S. Turaga,et al.  Learning Feature Engineering for Classification , 2017, IJCAI.

[202]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[203]  Ching Y. Suen,et al.  Automatic model selection for the optimization of SVM kernels , 2005, Pattern Recognit..

[204]  Junjie Yan,et al.  Practical Network Blocks Design with Q-Learning , 2017, ArXiv.