Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach

When monitoring spatial phenomena, such as the ecological condition of a river, deciding where to make observations is a challenging task. In these settings, a fundamental question is when an active learning, or sequential design, strategy, where locations are selected based on previous measurements, will perform significantly better than sensing at an a priori specified set of locations. For Gaussian Processes (GPs), which often accurately model spatial phenomena, we present an analysis and efficient algorithms that address this question. Central to our analysis is a theoretical bound which quantifies the performance difference between active and a priori design strategies. We consider GPs with unknown kernel parameters and present a nonmyopic approach for trading off exploration, i.e., decreasing uncertainty about the model parameters, and exploitation, i.e., near-optimally selecting observations when the parameters are (approximately) known. We discuss several exploration strategies, and present logarithmic sample complexity bounds for the exploration phase. We then extend our algorithm to handle nonstationary GPs exploiting local structure in the model. We also present extensive empirical evaluation on several real-world problems.

[1]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[2]  R. Varga,et al.  Proof of Theorem 4 , 1983 .

[3]  Richard S. Varga,et al.  Proof of Theorem 6 , 1983 .

[4]  W. F. Caselton,et al.  Optimal monitoring network designs , 1984 .

[5]  Henry P. Wynn,et al.  Maximum entropy sampling , 1987 .

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[8]  Maurice Queyranne,et al.  An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[9]  Daphne Koller Structured Probabilistic Models: Bayesian Networks and Beyond , 1998, AAAI/IAAI.

[10]  A. Storkey Truncated covariance matrices and Toeplitz methods in Gaussian processes , 1999 .

[11]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[12]  Klaus Obermayer,et al.  Gaussian Process Regression: Active Data Selection and Test Point Rejection , 2000, DAGM-Symposium.

[13]  W. Dunsmuir,et al.  Estimation of nonstationary spatial covariance structure , 2002 .

[14]  Christopher J. Paciorek,et al.  Nonstationary Gaussian Processes for Regression and Spatial Modelling , 2003 .

[15]  K. Taira Proof of Theorem 1.3 , 2004 .

[16]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[17]  Robert B. Gramacy,et al.  Bayesian treed gaussian process models , 2005 .

[18]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[19]  Robert D. Nowak,et al.  Faster Rates in Regression via Active Learning , 2005, NIPS.

[20]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[21]  M. Stein,et al.  Spatial sampling design for prediction with estimated parameters , 2006 .

[22]  M. Stealey,et al.  High Resolution River Hydraulic and Water Quality Characterization Using Rapidly Deployable Networked Infomechanical Systems (NIMS RD) , 2007 .

[23]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[24]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..