A Framework for Bayesian Optimization in Embedded Subspaces

We present a theoretically founded approach for high-dimensional Bayesian optimization based on low-dimensional subspace embeddings. We prove that the error in the Gaussian process model is bounded tightly when going from the original high-dimensional search domain to the lowdimensional embedding. This implies that the optimization process in the low-dimensional embedding proceeds essentially as if it were run directly on an unknown active subspace of low dimensionality. The argument applies to a large class of algorithms and GP models, including non-stationary kernels. Moreover, we provide an efficient implementation based on hashing and demonstrate empirically that this subspace embedding achieves considerably better results than the previously proposed methods for highdimensional BO based on Gaussian matrix projections and structure-learning.

[1]  R. A. Miller,et al.  Sequential kriging optimization using multiple-fidelity evaluations , 2006 .

[2]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[3]  Marcus R. Frean,et al.  Using Gaussian Processes to Optimize Expensive Functions , 2008, Australasian Conference on Artificial Intelligence.

[4]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[5]  Andreas Krause,et al.  Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[6]  Matthias Poloczek,et al.  Multi-Information Source Optimization , 2016, NIPS.

[7]  Andrew Gordon Wilson,et al.  Scaling Gaussian Process Regression with Derivatives , 2018, NeurIPS.

[8]  Huy L. Nguyen,et al.  Sparsity lower bounds for dimensionality reducing maps , 2012, STOC '13.

[9]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[10]  Stephen J. Roberts,et al.  Optimization, fast and slow: optimally switching between local and Bayesian optimization , 2018, ICML.

[11]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[12]  Zoubin Ghahramani,et al.  Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions , 2015, NIPS.

[13]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[14]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[15]  Volkan Cevher,et al.  High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups , 2018, AISTATS.

[16]  Zi Wang,et al.  Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[17]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[18]  Huy L. Nguyen,et al.  Lower Bounds for Oblivious Subspace Embeddings , 2013, ICALP.

[19]  A. O'Hagan,et al.  Predicting the output from a complex computer code when fast approximations are available , 2000 .

[20]  Peter I. Frazier,et al.  Parallel Bayesian Global Optimization of Expensive Functions , 2016, Oper. Res..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[23]  Zi Wang,et al.  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning , 2017, ICML.

[24]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[25]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[26]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[27]  David P. Woodruff,et al.  Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[28]  Victor Picheny,et al.  Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian , 2016, NIPS.

[29]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[30]  Max Welling,et al.  BOCK : Bayesian Optimization with Cylindrical Kernels , 2018, ICML.

[31]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[32]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[33]  Matthias Poloczek,et al.  Bayesian Optimization with Gradients , 2017, NIPS.

[34]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[35]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[36]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[37]  David Ginsbourger,et al.  On the choice of the low-dimensional domain for global optimization via random embeddings , 2017, Journal of Global Optimization.

[38]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[39]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[40]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[41]  David Ginsbourger,et al.  A Warped Kernel Improving Robustness in Bayesian Optimization Via Random Embeddings , 2014, LION.

[42]  Christos Boutsidis,et al.  Random Projections for Linear Support Vector Machines , 2012, TKDD.

[43]  Peter I. Frazier,et al.  The Parallel Knowledge Gradient Method for Batch Bayesian Optimization , 2016, NIPS.

[44]  Christian Sohler,et al.  Random projections for Bayesian regression , 2015, Statistics and Computing.

[45]  Andreas Krause,et al.  Joint Optimization and Variable Selection of High-dimensional Gaussian Processes , 2012, ICML.

[46]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[47]  Kirthevasan Kandasamy,et al.  Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[48]  David Ginsbourger,et al.  Differentiating the Multipoint Expected Improvement for Optimal Batch Design , 2015, MOD.

[49]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[50]  Roman Garnett,et al.  Discovering and Exploiting Additive Structure for Bayesian Optimization , 2017, AISTATS.

[51]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[52]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[53]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).