Bayesian Nonparametric Multi-Optima Policy Search in Reinforcement Learning

Skills can often be performed in many different ways. In order to provide robots with human-like adaptation capabilities, it is of great interest to learn several ways of achieving the same skills in parallel, since eventual changes in the environment or in the robot can make some solutions unfeasible. In this case, the knowledge of multiple solutions can avoid relearning the task. This problem is addressed in this paper within the framework of Reinforcement Learning, as the automatic determination of multiple optimal parameterized policies. For this purpose, a model handling a variable number of policies is built using a Bayesian non-parametric approach. The algorithm is first compared to single policy algorithms on known benchmarks. It is then applied to a typical robotic problem presenting multiple solutions.

[1]  M. Latash,et al.  Motor Control Strategies Revealed in the Structure of Motor Variability , 2002, Exercise and sport sciences reviews.

[2]  Andrew W. Moore,et al.  Policy Search using Paired Comparisons , 2003, J. Mach. Learn. Res..

[3]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[4]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[5]  Peter I. Corke,et al.  Robotics, Vision and Control - Fundamental Algorithms in MATLAB® , 2011, Springer Tracts in Advanced Robotics.

[6]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[7]  Dagmar Sternad,et al.  Coordinate Dependence of Variability Analysis , 2010, PLoS Comput. Biol..

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Joshua B. Tenenbaum,et al.  Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.

[12]  J. Peters,et al.  Using Reward-weighted Regression for Reinforcement Learning of Task Space Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[13]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[14]  Jan Peters,et al.  Learning concurrent motor skills in versatile solution spaces , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[16]  Gregor Schöner,et al.  The uncontrolled manifold concept: identifying control variables for a functional task , 1999, Experimental Brain Research.

[17]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[18]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[19]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[20]  Nikolaos G. Tsagarakis,et al.  Statistical dynamical systems for skills acquisition in humanoids , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[21]  Marin Kobilarov,et al.  Cross-entropy motion planning , 2012, Int. J. Robotics Res..

[22]  Etienne Burdet,et al.  Motor planning explains human behaviour in tasks with multiple solutions , 2013, Robotics Auton. Syst..

[23]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[24]  D. Sternad,et al.  Decomposition of variability in the execution of goal-oriented tasks: three components of skill improvement. , 2004, Journal of experimental psychology. Human perception and performance.

[25]  Darwin G. Caldwell,et al.  Multi-optima exploration with adaptive Gaussian mixture model , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[26]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[27]  Darwin G. Caldwell,et al.  Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning , 2013, Robotics Auton. Syst..

[28]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[29]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[30]  Daniel H. Grollman,et al.  Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.