An Actor-Critic Approach for Legible Robot Motion Planner

In human-robot collaboration, it is crucial for the robot to make its intentions clear and predictable to the human partners. Inspired by the mutual learning and adaptation of human partners, we suggest an actor-critic approach for a legible robot motion planner. This approach includes two neural networks and a legibility evaluator: 1) A policy network based on deep reinforcement learning (DRL); 2) A Recurrent Neural Networks (RNNs) based sequence to sequence (Seq2Seq) model as a motion predictor; 3) A legibility evaluator that maps motion to legible reward. Through a series of human-subject experiments, we demonstrate that with a simple handicraft function and no real-human data, our method lead to improved collaborative performance against a baseline method and a non-prediction method.

[1]  A. Edmondson Psychological Safety and Learning Behavior in Work Teams , 1999 .

[2]  J. Decety,et al.  From the perception of action to the understanding of intention , 2001, Nature reviews. Neuroscience.

[3]  Andrea Lockerd Thomaz,et al.  Generating anticipation in robot motion , 2011, 2011 RO-MAN.

[4]  Wendy Ju,et al.  Expressing thought: Improving robot readability with animation principles , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  Klaus Bengler,et al.  Directly or on detours? How should industrial robots approximate humans? , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Dizan Vasquez,et al.  A survey on motion prediction and risk assessment for intelligent vehicles , 2014, ROBOMECH Journal.

[9]  Siddhartha S. Srinivasa,et al.  Legible robot pointing , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[10]  Min Zhao,et al.  An Experimental Study for Identifying Features of Legible Manipulator Paths , 2014, ISER.

[11]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[12]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Manuel Lopes,et al.  Learning Legible Motion from Human–Robot Interactions , 2017, International Journal of Social Robotics.

[14]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Bilge Mutlu,et al.  A flexible optimization-based method for synthesizing intent-expressive robot arm motion , 2018, Int. J. Robotics Res..

[17]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.