Task Refinement for Autonomous Robots Using Complementary Corrective Human Feedback

A robot can perform a given task through a policy that maps its sensed state to appropriate actions. We assume that a hand-coded controller can achieve such a mapping only for the basic cases of the task. Refining the controller becomes harder and gets more tedious and error prone as the complexity of the task increases. In this paper, we present a new learning from demonstration approach to improve the robot's performance through the use of corrective human feedback as a complement to an existing hand-coded algorithm. The human teacher observes the robot as it performs the task using the hand-coded algorithm and takes over the control to correct the behavior when the robot selects a wrong action to be executed. Corrections are captured as new state-action pairs and the default controller output is replaced by the demonstrated corrections during autonomous execution when the current state of the robot is decided to be similar to a previously corrected state in the correction database. The proposed approach is applied to a complex ball dribbling task performed against stationary defender robots in a robot soccer scenario, where physical Aldebaran Nao humanoid robots are used. The results of our experiments show an improvement in the robot's performance when the default hand-coded controller is augmented with corrective human demonstration.

[1]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Manuela M. Veloso,et al.  Teaching collaborative multi-robot tasks through demonstration , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[3]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[4]  Aude Billard,et al.  What is the Teacher"s Role in Robot Programming by Demonstration? - Toward Benchmarks for Improved Learning , 2007 .

[5]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[6]  Manuela M. Veloso,et al.  Multi-humanoid world modeling in Standard Platform robot soccer , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[7]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[8]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[9]  Aude Billard,et al.  Tactile Correction and Multiple Training Data Sources for Robot Motion Control , 2009, NIPS 2009.

[10]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[12]  Jan Hoffmann,et al.  A Vision Based System for Goal-Directed Obstacle Avoidance , 2004, RoboCup.

[13]  Manuela M. Veloso,et al.  Biped Walk Learning Through Playback and Corrective Demonstration , 2010, AAAI.

[14]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[15]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[16]  Brett Browning,et al.  STP: Skills, tactics, and plays for multi-robot control in adversarial environments , 2005 .

[17]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[18]  Aude Billard,et al.  Learning Non-linear Multivariate Dynamics of Motion in Robotic Manipulators , 2011, Int. J. Robotics Res..

[19]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[20]  M. Veloso,et al.  Multiagent Collaborative Task Learning through Imitation , 2007 .

[21]  Sonia Chernova,et al.  Confidence-Based Demonstration Selection for Interactive Robot Learning , 2007 .

[22]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[23]  Andrea Lockerd Thomaz,et al.  Teaching and working with robots as a collaboration , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[24]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[25]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[26]  Manuela M. Veloso,et al.  Learning equivalent action choices from demonstration , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[28]  Gordon Cheng,et al.  Learning Similar Tasks From Observation and Practice , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[30]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[31]  Aude Billard,et al.  Tactile Guidance for Policy Adaptation , 2011, Found. Trends Robotics.

[32]  Manuela M. Veloso,et al.  Visual sonar: fast obstacle avoidance using monocular vision , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[33]  Eric L. Sauser,et al.  An Approach Based on Hidden Markov Model and Gaussian Mixture Regression , 2010 .

[34]  D.H. Grollman,et al.  Learning robot soccer skills from demonstration , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[35]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[36]  Brian Coltin,et al.  Challenges of Multi-Robot World Modelling in Dynamic and Adversarial Domains , 2010 .

[37]  Manuela M. Veloso,et al.  Improving Biped Walk Stability Using Real-Time Corrective Human Feedback , 2010, RoboCup.

[38]  Manuela M. Veloso,et al.  Sensor resetting localization for poorly modelled mobile robots , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[39]  Aude Billard,et al.  Policy Adaptation through Tactile Correction , 2010 .