Joint Optimization and Variable Selection of High-dimensional Gaussian Processes

Maximizing high-dimensional, nonconvex functions through noisy observations is a notoriously hard problem, but one that arises in many applications. In this paper, we tackle this challenge by modeling the unknown function as a sample from a high-dimensional Gaussian process (GP) distribution. Assuming that the unknown function only depends on few relevant variables, we show that it is possible to perform joint variable selection and GP optimization. We provide strong performance guarantees for our algorithm, bounding the sample complexity of variable selection, and as well as providing cumulative regret bounds. We further provide empirical evidence on the effectiveness of our algorithm on several benchmark optimization problems.

[1]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[2]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[3]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[4]  D. Ginsbourger,et al.  Towards Gaussian Process-based Optimization with Finite Time Horizon , 2010 .

[5]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[6]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[7]  Robert D. Nowak,et al.  Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation , 2010, IEEE Transactions on Information Theory.

[8]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[9]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[10]  L. Birge,et al.  An alternative point of view on Lepski's method , 2001 .

[11]  S. Ghosal,et al.  Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[15]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[16]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.