Vizier : A Service for Black-Box Optimization

Any sufficiently complex system acts as a black box when it becomes easier to experiment with than to understand. Hence, black-box optimization has become increasingly important as systems have become more complex. In this paper we describe Google Vizier, a Google-internal service for performing black-box optimization that has become the de facto parameter tuning engine at Google. Google Vizier is used to optimize many of our machine learning models and other systems, and also provides core capabilities to Google’s Cloud Machine Learning HyperTune subsystem. We discuss our requirements, infrastructure design, underlying algorithms, and advanced features such as transfer learning and automated early stopping that the service provides.

[1]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[2]  H. Chernoff Sequential Design of Experiments , 1959 .

[3]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[4]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[5]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[6]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[7]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[8]  J. Ginebra,et al.  Response surface bandits , 1995 .

[9]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[10]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[11]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[14]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[15]  Daniel Weiskopf,et al.  State of the Art of Parallel Coordinates , 2013, Eurographics.

[16]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[17]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[18]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[19]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[20]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[21]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[22]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[23]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[24]  Nikolaos V. Sahinidis,et al.  Derivative-free optimization: a review of algorithms and comparison of software implementations , 2013, J. Glob. Optim..

[25]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[26]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[27]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[28]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.