Simultaneous discovery of multiple alternative optimal policies by reinforcement learning

Conventional reinforcement learning algorithms for direct policy search are limited to finding only a single optimal policy. This is caused by their local-search nature, which allows them to converge only to a single local optimum in policy space, and makes them heavily dependent on the policy initialization. In this paper, we propose a novel reinforcement learning algorithm for direct policy search, which is capable of simultaneously finding multiple alternative optimal policies. The algorithm is based on particle filtering and performs global search in policy space, therefore eliminating the dependency on the policy initialization, and having the ability to find the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art Expectation-Maximization based reinforcement learning algorithm.

[1]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[2]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[5]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[6]  Kenji Doya,et al.  Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter , 2003, NIPS.

[7]  Andrew G. Barto,et al.  Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[8]  Dieter Fox,et al.  Adaptive real-time particle filters for robot localization , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[9]  Sebastian Thrun,et al.  FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping that Provably Converges , 2003, IJCAI.

[10]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[11]  Andrew Howard,et al.  Multi-robot Simultaneous Localization and Mapping using Particle Filters , 2005, Int. J. Robotics Res..

[12]  James J. Little,et al.  Vision-based SLAM using the Rao-Blackwellised Particle Filter , 2005 .

[13]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[14]  James J. Little,et al.  Design and analysis of a framework for real-time vision-based SLAM using Rao-Blackwellised particle filters , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[15]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[16]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[17]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[20]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[21]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[23]  Hidetomo Ichihashi,et al.  Proposed particle-filtering method for reinforcement learning , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[24]  Hidetomo Ichihashi,et al.  Improvement of Particle Filter for Reinforcement Learning , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.