Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning

© 2018 35th International Conference on Machine Learning, ICML 2018. All rights reserved. Bayesian neural networks with latent variables are scalable and flexible probabilistic models: They account for uncertainty in the estimation of the network weights and, by making use of latent variables, can capture complex noise patterns in the : data. Using these models we show how to per- ∗ form and utilize a decomposition of uncertainty in • aleatoric and epistemic components for decision making purposes. This allows us to successfully identify informative points for active learning of functions with heteroscedastic and bimodal noise. ' t Using the decomposition we further define a novel risk-sensitive criterion for reinforcement learning to identify policies that balance expected cost, model-bias and noise aversion.

[1]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[2]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[3]  Andrew Y. Ng,et al.  Solving Uncertain Markov Decision Processes , 2001 .

[4]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[5]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[6]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[7]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Zoubin Ghahramani,et al.  Collaborative Gaussian Processes for Preference Learning , 2012, NIPS.

[9]  Alexander J. Smola,et al.  Heteroscedastic Gaussian process regression , 2005, ICML.

[10]  Rowan McAllister,et al.  Bayesian learning for data-efficient control , 2017 .

[11]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[12]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[13]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[14]  Yee Whye Teh,et al.  Particle Value Functions , 2017, ICLR.

[15]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[16]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[17]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[18]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[19]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[20]  Helge Aagaard Madsen,et al.  Comparison of measured and simulated loads for the Siemens SWT 2.3 operating in wake conditions at the Lillgrund Wind Farm using HAWC2 and the dynamic wake meander model , 2015 .

[21]  Pramod Viswanath,et al.  Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation , 2016, IEEE Transactions on Information Theory.

[22]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[23]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[24]  Alborz Geramifard,et al.  Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[25]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[26]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[27]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[28]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[29]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[30]  Alexander Shapiro,et al.  Minimax analysis of stochastic problems , 2002, Optim. Methods Softw..

[31]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[32]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[33]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[34]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[35]  Thomas A. Runkler,et al.  A benchmark environment motivated by industrial control problems , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[36]  Steffen Udluft,et al.  Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data , 2009, ICANN.

[37]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[38]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.