Sparse Latent Space Policy Search

Computational agents often need to learn policies that involve many control variables, e.g., a robot needs to control several joints simultaneously. Learning a policy with a high number of parameters, however, usually requires a large number of training samples. We introduce a reinforcement learning method for sample-efficient policy search that exploits correlations between control variables. Such correlations are particularly frequent in motor skill learning tasks. The introduced method uses Variational Inference to estimate policy parameters, while at the same time uncovering a low-dimensional latent space of controls. Prior knowledge about the task and the structure of the learning agent can be provided by specifying groups of potentially correlated parameters. This information is then used to impose sparsity constraints on the mapping between the high-dimensional space of controls and a lower-dimensional latent space. In experiments with a simulated bi-manual manipulator, the new approach effectively identifies synergies between joints, performs efficient low-dimensional policy search, and outperforms state-of-the-art policy search methods.

[1]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[3]  Pierre Blazevic,et al.  The NAO humanoid: a combination of performance and affordability , 2008, ArXiv.

[4]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[5]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[6]  Lena H Ting,et al.  Subject-specific muscle synergies in human balance control are consistent across different biomechanical contexts. , 2010, Journal of neurophysiology.

[7]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[9]  N. A. Bernshteĭn The co-ordination and regulation of movements , 1967 .

[10]  Oliver Kroemer,et al.  Towards Motor Skill Learning for Robotics , 2007, ISRR.

[11]  Andrew Y. Ng,et al.  Learning omnidirectional path following using dimensionality reduction , 2007, Robotics: Science and Systems.

[12]  Jan Peters,et al.  Latent space policy search for robotics , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Samuel Kaski,et al.  Group Factor Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[17]  J. F. Soechting,et al.  Postural Hand Synergies for Tool Use , 1998, The Journal of Neuroscience.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Mark Halaki,et al.  A review on the coordinative structure of human walking and the application of principal component analysis☆ , 2013, Neural regeneration research.

[20]  Anthony Ephremides,et al.  The problem of medium access control in wireless sensor networks , 2004, IEEE Wireless Communications.

[21]  David Tolpin,et al.  Black-Box Policy Search with Probabilistic Programs , 2015, AISTATS.

[22]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.