Tensor Based Knowledge Transfer Across Skill Categories for Robot Control

Advances in hardware and learning for control are enabling robots to perform increasingly dextrous and dynamic control tasks. These skills typically require a prohibitive amount of exploration for reinforcement learning, and so are commonly achieved by imitation learning from manual demonstration. The costly non-scalable nature of manual demonstration has motivated work into skill generalisation, e.g., through contextual policies and options. Despite good results, existing work along these lines is limited to generalising across variants of one skill such as throwing an object to different locations. In this paper we go significantly further and investigate generalisation across qualitatively different classes of control skills. In particular, we introduce a class of neural network controllers that can realise four distinct skill classes: reaching, object throwing, casting, and ball-in-cup. By factorising the weights of the neural network, we are able to extract transferrable latent skills that enable dramatic acceleration of learning in cross-task transfer. With a suitable curriculum, this allows us to learn challenging dextrous control tasks like ball-in-cup from scratch with pure reinforcement learning.

[1]  Eric Eaton,et al.  Active Task Selection for Lifelong Machine Learning , 2013, AAAI.

[2]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[3]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[4]  Masashi Sugiyama,et al.  Multitask learning meets tensor factorization: task imputation via convex optimization , 2014, NIPS.

[5]  Yongxin Yang,et al.  A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[6]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[7]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[8]  Olivier Sigaud,et al.  Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[9]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[11]  Jan Peters,et al.  Latent space policy search for robotics , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[13]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[14]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[15]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[16]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[17]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[18]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[19]  Petros Koumoutsakos,et al.  Learning probability distributions in continuous evolutionary algorithms – a comparative review , 2004, Natural Computing.

[20]  Abdelaziz Benallegue,et al.  Dynamic feedback controller of Euler angles and wind parameters estimation for a quadrotor unmanned aerial vehicle , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[21]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[22]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[23]  David Isele,et al.  Lifelong learning for disturbance rejection on mobile robots , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[25]  Freek Stulp,et al.  Simultaneous on-line Discovery and Improvement of Robotic Skill options , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[27]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[28]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[30]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[31]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[32]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[33]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[34]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.