Safe Exploration for Active Learning with Gaussian Processes

In this paper, the problem of safe exploration in the active learning context is considered. Safe exploration is especially important for data sampling from technical and industrial systems, e.g. combustion engines and gas turbines, where critical and unsafe measurements need to be avoided. The objective is to learn data-based regression models from such technical systems using a limited budget of measured, i.e. labelled, points while ensuring that critical regions of the considered systems are avoided during measurements. We propose an approach for learning such models and exploring new data regions based on Gaussian processes GP's. In particular, we employ a problem specific GP classifier to identify safe and unsafe regions, while using a differential entropy criterion for exploring relevant data regions. A theoretical analysis is shown for the proposed algorithm, where we provide an upper bound for the probability of failure. To demonstrate the efficiency and robustness of our safe exploration scheme in the active learning setting, we test the approach on a policy exploration task for the inverse pendulum hold up problem.

[1]  Francisco Javier García-Polo,et al.  Safe reinforcement learning in high-risk tasks through policy improvement , 2011, ADPRL.

[2]  Claire J. Tomlin,et al.  Guaranteed safe online learning of a bounded system , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[4]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[6]  Chris Bailey-Kellogg,et al.  Gaussian Processes for Active Data Mining of Spatial Aggregates , 2005, SDM.

[7]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[8]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[9]  Peter Geibel,et al.  Reinforcement Learning with Bounded Risk , 2001, ICML.

[10]  David E. Booth,et al.  Applied Multivariate Analysis , 2003, Technometrics.

[11]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[12]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[13]  Maurice Queyranne,et al.  An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[14]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[15]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[16]  Klaus Obermayer,et al.  Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Michèle Sebag,et al.  Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.

[19]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[20]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[21]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[22]  I. Sobol Uniformly distributed sequences with an additional uniform property , 1976 .

[23]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[24]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..