论文信息 - Random Search for Hyper-Parameter Optimization

Random Search for Hyper-Parameter Optimization

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

Yoshua Bengio | James Bergstra | Yoshua Bengio | J. Bergstra

[1] J. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals , 1960 .

[2] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[3] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[4] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[5] W. Vent,et al. Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[6] I. A. Antonov,et al. An economic method of computing LPτ-sequences , 1979 .

[7] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[8] Harald Niederreiter,et al. Implementation and tests of low-discrepancy sequences , 1992, TOMC.

[9] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[10] M. Powell. A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation , 1994 .

[11] A. Owen,et al. Valuation of mortgage-backed securities using Brownian bridges to reduce effective dimension , 1997 .

[12] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13] Radford M. Neal. Assessing Relevance determination methods using DELVE , 1998 .

[14] J. Spall,et al. Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .

[15] Richard J. Beckman,et al. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[16] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[17] Brian Gough,et al. GNU Scientific Library Reference Manual - Third Edition , 2003 .

[18] Alexander Nareyek,et al. Choosing search heuristics by non-stationary reinforcement learning , 2004 .

[19] Tito Homem-de-Mello,et al. Quas-Monte Carlo Strategies for Stochastic Optimization , 2006, Proceedings of the 2006 Winter Simulation Conference.

[20] C. Weihs,et al. Response Surface Methodology for Optimizing Hyper Parameters , 2006 .

[21] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[23] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[24] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[25] Frank Hutter,et al. Automated configuration of algorithms for solving hard computational problems , 2009 .

[26] Thomas Weise,et al. Global Optimization Algorithms -- Theory and Application , 2009 .

[27] Ashwin Srinivasan,et al. Parameter Screening and Optimisation for ILP using Designed Experiments , 2011, J. Mach. Learn. Res..

[28] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[29] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[30] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[31] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[32] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[33] BengioYoshua,et al. Random search for hyper-parameter optimization , 2012 .

[34] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[35] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.