Near-Optimal Active Learning of Multi-Output Gaussian Processes

This paper addresses the problem of active learning of a multi-output Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomial-time approximation algorithm that guarantees a constant-factor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on real-world datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and single-output GP models.

[1]  F. J. Alonso,et al.  A state-space model approach to optimum spatial sampling design based on entropy , 1998, Environmental and Ecological Statistics.

[2]  Richard Webster,et al.  Spectral Analysis of Gilgai Soil , 1977 .

[3]  R. Reese Geostatistics for Environmental Scientists , 2001 .

[4]  Mohan S. Kankanhalli,et al.  Active Learning Is Planning: Nonmyopic ε-Bayes-Optimal Active Learning of Gaussian Processes , 2014, ECML/PKDD.

[5]  Kian Hsiang Low,et al.  Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations , 2013, UAI.

[6]  Kian Hsiang Low,et al.  Active Markov information-theoretic path planning for robotic environmental sensing , 2011, AAMAS.

[7]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[8]  Kian Hsiang Low,et al.  Multi-robot informative path planning for active sensing of environmental phenomena: a tale of two algorithms , 2013, AAMAS.

[9]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[10]  Kian Hsiang Low,et al.  Gaussian Process Decentralized Data Fusion and Active Sensing for Spatiotemporal Traffic Modeling and Prediction in Mobility-on-Demand Systems , 2015, IEEE Transactions on Automation Science and Engineering.

[11]  Kian Hsiang Low,et al.  Adaptive multi-robot wide-area exploration and mapping , 2008, AAMAS.

[12]  Kian Hsiang Low,et al.  GP-Localize: Persistent Mobile Robot Localization using Online Sparse Gaussian Process Observation Model , 2014, AAAI.

[13]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[14]  José M. Angulo,et al.  Random perturbation methods applied to multivariate spatial sampling design , 2001 .

[15]  Marko Wagner,et al.  Geostatistics For Environmental Scientists , 2016 .

[16]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[17]  Kian Hsiang Low,et al.  Gaussian Process-Based Decentralized Data Fusion and Active Sensing for Mobility-on-Demand System , 2013, Robotics: Science and Systems.

[18]  Gaurav S. Sukhatme,et al.  Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena , 2012, UAI.

[19]  Qiang Yang,et al.  Active Transfer Learning for Cross-System Recommendation , 2013, AAAI.

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[22]  Kian Hsiang Low,et al.  Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond , 2015, AAAI.

[23]  Grigorios Skolidis,et al.  Transfer learning with Gaussian processes , 2012 .

[24]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[25]  Ueli Maurer,et al.  About the mutual (conditional) information , 2002, Proceedings IEEE International Symposium on Information Theory,.

[26]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[27]  Kian Hsiang Low,et al.  Information-Theoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing , 2009, ICAPS.

[28]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[29]  Mohan S. Kankanhalli,et al.  Nonmyopic \(\epsilon\)-Bayes-Optimal Active Learning of Gaussian Processes , 2014, ICML.

[30]  Kian Hsiang Low,et al.  A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data , 2015, ICML.

[31]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[32]  Yi Zhang,et al.  Multi-Task Active Learning with Output Constraints , 2010, AAAI.

[33]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[34]  Gene H. Golub,et al.  Matrix computations , 1983 .

[35]  Kian Hsiang Low,et al.  Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation , 2014, AAAI.

[36]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[37]  Jon Lee Maximum entropy sampling , 2001 .

[38]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[39]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[40]  M. C. Bueso,et al.  Optimal Spatial Sampling Design in a Multivariate Framework , 1999 .