论文信息 - Direct policy search reinforcement learning based on particle filtering

Direct policy search reinforcement learning based on particle filtering

We reveal a link between particle ltering methods and direct policy search reinforcement learning, and propose a novel reinforcement learning algorithm, based heavily on ideas borrowed from particle lters. A major advantage of the proposed algorithm is its ability to perform global search in policy space and thus nd the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art ExpectationMaximization based reinforcement learning algorithm.

Darwin G. Caldwell | Petar Kormushev

[1] Hidetomo Ichihashi,et al. Improvement of Particle Filter for Reinforcement Learning , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[2] Alex M. Andrew,et al. ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[3] James J. Little,et al. Vision-based SLAM using the Rao-Blackwellised Particle Filter , 2005 .

[4] Kenji Doya,et al. Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter , 2003, NIPS.

[5] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[6] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[7] Hidetomo Ichihashi,et al. Proposed particle-filtering method for reinforcement learning , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[8] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[9] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10] Sebastian Thrun,et al. FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping that Provably Converges , 2003, IJCAI.

[11] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[13] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[14] Wolfram Burgard,et al. Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[15] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16] Eric Moulines,et al. Comparison of resampling schemes for particle filtering , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[17] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[18] Simon J. Godsill,et al. On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[19] Andrew Howard,et al. Multi-robot Simultaneous Localization and Mapping using Particle Filters , 2005, Int. J. Robotics Res..

[20] Stefan Schaal,et al. Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[21] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23] Andrew G. Barto,et al. Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[24] Dieter Fox,et al. Adaptive real-time particle filters for robot localization , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[25] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .