Multi-resolution Corrective Demonstration for Efficient Task Execution and Refinement

Computationally efficient task execution is very important for autonomous mobile robots endowed with limited on-board computational resources. Most robot control approaches assume a fixed state and action representation, and use a single algorithm to map states to actions. However, not all situations in a given task require equally complex algorithms and equally detailed state and action representations. The main motivation for this work is a desire to reduce the computational footprint of performing a task by allowing the robot to run simpler algorithms whenever possible, and resort to a more complex algorithm only when needed. We contribute the Multi-Resolution Task Execution (MRTE) algorithm that utilizes human feedback to learn a mapping from a given state to an appropriate detail resolution consisting of a state and action representation, and an algorithm providing a mapping from states to actions at that resolution. The robot learns a policy from human demonstration to switch between different detail resolutions as needed while favoring lower detail resolutions to reduce computational cost of task execution. We then present the Model Plus Correction (M+C) algorithm to improve the performance of an algorithm using corrective human feedback without modifying the algorithm itself. Finally, we introduce the Multi-Resolution Model Plus Correction (MRM+C) algorithm as a combination of MRTE and M+C. MRM+C learns how to select an appropriate detail resolution to operate at in a given state from human demonstration. Furthermore, it allows the teacher to provide corrective demonstration at different detail resolutions to improve overall task execution performance. We provide formal definitions of MRTE, M+C, and MRM+C algorithms, and show how they relate to general robot control problem and Learning from Demonstration (LfD) approach. We present experimental results de-monstrating the effectiveness of proposed methods on a goal-directed humanoid obstacle avoidance task.

[1]  Richard T. Vaughan,et al.  The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[2]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[3]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[4]  Sean Luke,et al.  Hierarchical Learning from Demonstration on Humanoid Robots , 2010 .

[5]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[6]  Andrea Lockerd Thomaz,et al.  Automatic State Abstraction from Demonstration , 2011, IJCAI.

[7]  Çetin Meriçli,et al.  Task Refinement for Autonomous Robots Using Complementary Corrective Human Feedback , 2011 .

[8]  Manuela M. Veloso,et al.  Visual sonar: fast obstacle avoidance using monocular vision , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[9]  Çetin Meriçli,et al.  General Terms Algorithms , 2022 .

[10]  Aude Billard,et al.  Tactile Guidance for Policy Adaptation , 2011, Found. Trends Robotics.

[11]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[12]  Manuela M. Veloso,et al.  Improving biped walk stability with complementary corrective demonstration , 2012, Auton. Robots.

[13]  Sergey Levine,et al.  Feature Construction for Inverse Reinforcement Learning , 2010, NIPS.

[14]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Jan Hoffmann,et al.  A Vision Based System for Goal-Directed Obstacle Avoidance , 2004, RoboCup.

[16]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.