Demonstration-Guided Deep Reinforcement Learning of Control Policies for Dexterous Human-Robot Interaction

In this paper, we propose a method for training control policies for human-robot interactions such as handshakes or hand claps via Deep Reinforcement Learning. The policy controls a humanoid Shadow Dexterous Hand, attached to a robot arm. We propose a parameterizable multi-objective reward function that allows learning of a variety of interactions without changing the reward structure. The parameters of the reward function are estimated directly from motion capture data of human-human interactions in order to produce policies that are perceived as being natural and human-like by observers. We evaluate our method on three significantly different hand interactions: handshake, hand clap and finger touch. We provide detailed analysis of the proposed reward function and the resulting policies and conduct a large-scale user study, indicating that our policy produces natural looking motions.

[1]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[2]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[3]  Paul A. Beardsley,et al.  Handshakiness: Benchmarking for human-robot hand interactions , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Sergey Levine,et al.  Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Vikash Kumar,et al.  Fast, strong and compliant pneumatic actuation for dexterous tendon-driven hands , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  Clément Gosselin,et al.  Design, control and experimental validation of a haptic robotic hand performing human-robot handshake with human-like agility , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Yuval Tassa,et al.  Real-time behaviour synthesis for dynamic hand-manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Zoran Popovic,et al.  Contact-invariant optimization for hand manipulation , 2012, SCA '12.

[11]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[13]  Tohru Sasaki,et al.  Handshake request motion model with an approaching human for a handshake robot system , 2015, 2015 IEEE 7th International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM).

[14]  Bernhard Thomaszewski,et al.  Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation , 2017, SCA 2017.

[15]  Adriana Tapus,et al.  Let's handshake and I'll know who you are: Gender and personality discrimination in human-human and human-robot handshaking interaction , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[16]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[17]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[18]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[19]  C. Karen Liu,et al.  Dexterous manipulation using both palm and fingers , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[21]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[23]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[24]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[25]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[26]  Michael S. Ryoo,et al.  Learning social affordance grammar from videos: Transferring human interactions to human-robot interactions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.