Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing $\alpha$-divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.

[1]  B. Wahlberg System identification using Laguerre models , 1991 .

[2]  Sebastian Engell,et al.  Model Predictive Control Using Neural Networks [25 Years Ago] , 1995, IEEE Control Systems.

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[5]  Xin Wang,et al.  Model-based Policy Gradient Reinforcement Learning , 2003, ICML.

[6]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[7]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[8]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Steffen Udluft,et al.  A Neural Reinforcement Learning Approach to Gas Turbine Control , 2007, 2007 International Joint Conference on Neural Networks.

[10]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[11]  Steffen Udluft,et al.  The Recurrent Control Neural Network , 2007, ESANN.

[12]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[13]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[14]  Steffen Udluft,et al.  Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data , 2009, ICANN.

[15]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[17]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[18]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[19]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[20]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[21]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[22]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[23]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[24]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[25]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[26]  C. Rasmussen,et al.  Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[27]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[28]  Michel Tokic,et al.  Introduction to the "Industrial Benchmark" , 2016, ArXiv.

[29]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[30]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[31]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[32]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[33]  Thomas A. Runkler,et al.  Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces , 2016, Int. J. Swarm Intell. Res..

[34]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[35]  Daniel Hernández-Lobato,et al.  Black-Box Alpha Divergence Minimization , 2015, ICML.

[36]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[37]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .