论文信息 - Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks - 字舞流文

Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing $\alpha$-divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.

Finale Doshi-Velez | Steffen Udluft | José Miguel Hernández-Lobato | Stefan Depeweg | Stefan Depeweg | Finale Doshi-Velez | S. Udluft | F. Doshi-Velez

[1] B. Wahlberg. System identification using Laguerre models , 1991 .

[2] Sebastian Engell,et al. Model Predictive Control Using Neural Networks [25 Years Ago] , 1995, IEEE Control Systems.

[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[5] Xin Wang,et al. Model-based Policy Gradient Reinforcement Learning , 2003, ICML.

[6] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .

[7] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[8] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9] Steffen Udluft,et al. A Neural Reinforcement Learning Approach to Gas Turbine Control , 2007, 2007 International Joint Conference on Neural Networks.

[10] Dieter Fox,et al. Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[11] Steffen Udluft,et al. The Recurrent Control Neural Network , 2007, ESANN.

[12] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[13] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[14] Steffen Udluft,et al. Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data , 2009, ICANN.

[15] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[17] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[18] Matthew W. Hoffman,et al. Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[19] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.

[20] Vivek Rathod,et al. Bayesian dark knowledge , 2015, NIPS.

[21] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[22] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[23] Masashi Sugiyama,et al. Bayesian Dark Knowledge , 2015 .

[24] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[25] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[26] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[27] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[28] Michel Tokic,et al. Introduction to the "Industrial Benchmark" , 2016, ArXiv.

[29] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[30] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[31] Daniel Hernández-Lobato,et al. Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[32] Richard E. Turner,et al. Black-box α-divergence minimization , 2016, ICML 2016.

[33] Thomas A. Runkler,et al. Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces , 2016, Int. J. Swarm Intell. Res..

[34] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[35] Daniel Hernández-Lobato,et al. Black-Box Alpha Divergence Minimization , 2015, ICML.

[36] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[37] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .