Bayesian Functional Optimization

Bayesian optimization (BayesOpt) is a derivative-free approach for sequentially optimizing stochastic black-box functions. Standard BayesOpt, which has shown many successes in machine learning applications, assumes a finite dimensional domain which often is a parametric space. The parameter space is defined by the features used in the function approximations which are often selected manually. Therefore, the performance of BayesOpt inevitably depends on the quality of chosen features. This paper proposes a new Bayesian optimization framework that is able to optimize directly on the domain of function spaces. The resulting framework, Bayesian Functional Optimization (BFO), not only extends the application domains of BayesOpt to functional optimization problems but also relaxes the performance dependency on the chosen parameter space. We model the domain of functions as a reproducing kernel Hilbert space (RKHS), and use the notion of Gaussian processes on a real separable Hilbert space. As a result, we are able to define traditional improvement-based (PI and EI) and optimistic acquisition functions (UCB) as functionals. We propose to optimize the acquisition functionals using analytic functional gradients that are also proved to be functions in a RKHS. We evaluate BFO in three typical functional optimization tasks: i) a synthetic functional optimization problem, ii) optimizing activation functions for a multi-layer perceptron neural network, and iii) a reinforcement learning task whose policies are modeled in RKHS.

[1]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[2]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[3]  Byron Boots,et al.  Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs , 2016, Robotics: Science and Systems.

[4]  Byron Boots,et al.  Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces , 2016, Robotics: Science and Systems.

[5]  Stéphane Canu,et al.  Operator-valued Kernels for Learning from Functional Response Data , 2015, J. Mach. Learn. Res..

[6]  D. Lizotte Practical bayesian optimization , 2008 .

[7]  Verena Heidrich-Meisner,et al.  Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[8]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[9]  Guy Lever,et al.  Modelling Policies in MDPs in Reproducing Kernel Hilbert Space , 2015, AISTATS.

[10]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[11]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[12]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[13]  Andreas Krause,et al.  Learning Sparse Additive Models with Interactions in High Dimensions , 2016, AISTATS.

[14]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[15]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm with fixed mean and covariance functions , 2007, 0712.3744.

[16]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[17]  Peter Englert,et al.  Policy Search in Reproducing Kernel Hilbert Space , 2016, IJCAI.

[18]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[19]  Viet-Hung Dang,et al.  A Covariance Matrix Adaptation Evolution Strategy for Direct Policy Search in Reproducing Kernel Hilbert Space , 2017, ACML.

[20]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[21]  Matthew W. Hoffman,et al.  An Entropy Search Portfolio for Bayesian Optimization , 2014, ArXiv.

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[23]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[24]  T. Choi,et al.  Gaussian Process Regression Analysis for Functional Data , 2011 .

[25]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[26]  Andreas Krause,et al.  Efficient Sampling for Learning Sparse Additive Models in High Dimensions , 2014, NIPS.

[27]  Peter A. Norreys,et al.  Infinite dimensional optimistic optimisation with applications on physical systems , 2016, 1611.05845.

[28]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[29]  P. Jorgensen,et al.  Entropy encoding, Hilbert space, and Karhunen-Loève transforms , 2007, math-ph/0701056.

[30]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[31]  Soo Bong Chae Holomorphy and Calculus in Normed SPates , 1985 .

[32]  D. Dennis,et al.  A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.