Bayesian Optimization in a Billion Dimensions via Random Embeddings

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high-dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple, has important invariance properties, and applies to domains with both categorical and continuous variables. We present a thorough theoretical analysis of REMBO. Empirical results confirm that REMBO can effectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.

[1]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[2]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[3]  Frank Hutter,et al.  Automated configuration of algorithms for solving hard computational problems , 2009 .

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Andreas Krause,et al.  Joint Optimization and Variable Selection of High-dimensional Gaussian Processes , 2012, ICML.

[6]  Ziyun Wang,et al.  Predictive Adaptation of Hybrid Monte Carlo with Bayesian Parametric Bandits , 2011 .

[7]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[8]  Arnaud Doucet,et al.  SMC Samplers for Bayesian Optimal Nonlinear Design , 2006, 2006 IEEE Nonlinear Statistical Signal Processing Workshop.

[9]  H. Hoos,et al.  Generating Fast Domain-Optimized Planners by Automatically Configuring a Generic Parameterised Planner , 2011 .

[10]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[11]  D. Lizotte,et al.  An experimental methodology for response surface optimization methods , 2012, J. Glob. Optim..

[12]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[13]  Holger H. Hoos,et al.  Programming by optimization , 2012, Commun. ACM.

[14]  Nando de Freitas,et al.  Adaptive MCMC with Bayesian Optimization , 2012, AISTATS.

[15]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[16]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[17]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[18]  Alfonso Gerevini,et al.  Generating Fast Domain-Specific Planners by Automatically Configuring a Generic Parameterised Planner , 2011 .

[19]  Yuval Kalish,et al.  Making Science , 2014 .

[20]  Alexander J. Smola,et al.  Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations , 2012, ICML.

[21]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[22]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[23]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[24]  Kevin Leyton-Brown,et al.  An evaluation of sequential model-based optimization for expensive blackbox functions , 2013, GECCO.

[25]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[26]  M. Rudelson,et al.  Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[27]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[28]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[29]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[30]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[31]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[32]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[33]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[34]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[35]  A. P. Dawid,et al.  Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals , 2003 .

[36]  D. Lizotte Practical bayesian optimization , 2008 .

[37]  Alan Fern,et al.  Budgeted Optimization with Concurrent Stochastic-Duration Experiments , 2011, NIPS.

[38]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[39]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[40]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[41]  Alan Fern,et al.  Batch Bayesian Optimization via Simulation Matching , 2010, NIPS.

[42]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[43]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[44]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[45]  Ashish Sabharwal,et al.  Connections in Networks: A Hybrid Approach , 2008, CPAIOR.

[46]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[47]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[48]  Fabio Tozeto Ramos,et al.  Bayesian optimisation for Intelligent Environmental Monitoring , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[50]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[51]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm with fixed mean and covariance functions , 2007, 0712.3744.

[52]  Kevin Leyton-Brown,et al.  Parallel Algorithm Configuration , 2012, LION.

[53]  Nando de Freitas,et al.  New inference strategies for solving Markov Decision Processes using reversible jump MCMC , 2009, UAI.

[54]  Audris Mockus,et al.  Bayesian approach for randomization of heuristic algorithms of discrete programming , 1997, Randomization Methods in Algorithm Design.

[55]  Nando de Freitas,et al.  Self-Avoiding Random Dynamics on Integer Complex Systems , 2011, TOMC.

[56]  Nando de Freitas,et al.  Inference and Learning for Active Sensing, Experimental Design and Control , 2009, IbPRIA.

[57]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[58]  David D. Cox,et al.  Making a Science of Model Search , 2012, ArXiv.

[59]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[60]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[61]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[62]  Ali Jalali,et al.  Hybrid Batch Bayesian Optimization , 2012, ICML.

[63]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[64]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[65]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[66]  Shang-Hua Teng,et al.  Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices , 2003, SIAM J. Matrix Anal. Appl..

[67]  WangZiyu,et al.  Bayesian optimization in a billion dimensions via random embeddings , 2016 .

[68]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[69]  Robert B. Gramacy,et al.  Particle Learning of Gaussian Process Models for Sequential Design and Optimization , 2009, 0909.5262.