Direct policy search reinforcement learning based on particle filtering

We reveal a link between particle ltering methods and direct policy search reinforcement learning, and propose a novel reinforcement learning algorithm, based heavily on ideas borrowed from particle lters. A major advantage of the proposed algorithm is its ability to perform global search in policy space and thus nd the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art ExpectationMaximization based reinforcement learning algorithm.

[1]  Hidetomo Ichihashi,et al.  Improvement of Particle Filter for Reinforcement Learning , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[2]  Alex M. Andrew,et al.  ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[3]  James J. Little,et al.  Vision-based SLAM using the Rao-Blackwellised Particle Filter , 2005 .

[4]  Kenji Doya,et al.  Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter , 2003, NIPS.

[5]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Hidetomo Ichihashi,et al.  Proposed particle-filtering method for reinforcement learning , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[8]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[9]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Sebastian Thrun,et al.  FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping that Provably Converges , 2003, IJCAI.

[11]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[12]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[13]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[14]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[15]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16]  Eric Moulines,et al.  Comparison of resampling schemes for particle filtering , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[17]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[18]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[19]  Andrew Howard,et al.  Multi-robot Simultaneous Localization and Mapping using Particle Filters , 2005, Int. J. Robotics Res..

[20]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[21]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Andrew G. Barto,et al.  Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[24]  Dieter Fox,et al.  Adaptive real-time particle filters for robot localization , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[25]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .