Combining Model-Based $Q$ -Learning With Structural Knowledge Transfer for Robot Skill Learning

Learning skills autonomously is a particularly important ability for an autonomous robot. A promising approach is reinforcement learning (RL) where agents learn policy through interaction with its environment. One problem of RL algorithm is how to tradeoff the exploration and exploitation. Moreover, multiple tasks also make a great challenge to robot learning. In this paper, to enhance the performance of RL, a novel learning framework integrating RL with knowledge transfer is proposed. Three basic components are included: 1) probability policy reuse; 2) dynamic model learning; and 3) model-based <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-learning. In this framework, the prelearned skills are leveraged for policy reuse and dynamic learning. In model-based <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-learning, the Gaussian process regression is used to approximate the <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-value function so as to suit for robot control. The prior knowledge retrieved from knowledge transfer is integrated into the model-based <inline-formula> <tex-math notation="LaTeX">${Q}$ </tex-math></inline-formula>-learning to reduce the needed learning time. Finally, a human-robot handover experiment is performed to evaluate the learning performance of this learning framework. Experiment results show that fewer exploration is needed to obtain a high expected reward, due to the prior knowledge obtained from knowledge transfer.

[1]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[2]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[3]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[4]  Jianwei Zhang,et al.  Learning human compliant behavior from demonstration for force-based robot manipulation , 2016, 2016 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[5]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[7]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[8]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[9]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[10]  Jonathan P. How,et al.  Efficient reinforcement learning for robots using informative simulated priors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Stefan Wermter,et al.  Training Agents With Interactive Reinforcement Learning and Contextual Affordances , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[12]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[13]  Neil D. Lawrence,et al.  Sparse Convolved Gaussian Processes for Multi-output Regression , 2008, NIPS.

[14]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[15]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Paolo Tommasino,et al.  A Reinforcement Learning Architecture That Transfers Knowledge Between Skills When Solving Multiple Tasks , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[17]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[20]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[21]  Jun Morimoto,et al.  On-line motion synthesis and adaptation using a trajectory database , 2012, Robotics Auton. Syst..

[22]  Carl E. Rasmussen,et al.  Gaussian Process Training with Input Noise , 2011, NIPS.

[23]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[24]  Danica Kragic,et al.  A sensorimotor reinforcement learning framework for physical Human-Robot Interaction , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Ai Poh Loh,et al.  Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[26]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[27]  Wolfram Burgard,et al.  Learning Motion Patterns of People for Compliant Robot Motion , 2005, Int. J. Robotics Res..

[28]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[29]  Dongbin Zhao,et al.  Deep Reinforcement Learning With Visual Attention for Vehicle Classification , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[30]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[31]  Benjamin Rosman,et al.  Bayesian policy reuse , 2015, Machine Learning.

[32]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[34]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..