Confidence-based robot policy learning from demonstration

The problem of learning a policy, a task representation mapping from world states to actions, lies at the heart of many robotic applications. One approach to acquiring a task policy is learning from demonstration, an interactive technique in which a robot learns a policy based on example state to action mappings provided by a human teacher. This thesis introduces Confidence-Based Autonomy, a mixed-initiative single robot demonstration learning algorithm that enables the robot and teacher to jointly control the learning process and selection of demonstration training data. The robot to identifies the need for and requests demonstrations for specific parts of the state space based on confidence thresholds characterizing the uncertainty of the learned policy. The robot's demonstration requests are complemented by the teacher's ability to provide supplementary corrective demonstrations in error cases. An additional algorithmic component enables choices between multiple equally applicable actions to be represented explicitly within the robot's policy through the creation of option classes. Based on the single-robot Confidence-Based Autonomy algorithm, this thesis introduces a task and platform independent multi-robot demonstration learning framework for teaching multiple robots. Building upon this framework, we formalize three approaches to teaching emergent collaborative behavior based on different information sharing strategies. We provide detailed evaluations of all algorithms in multiple simulated and robotic domains, and present a case study analysis of the scalability of the presented techniques using up to seven robots.

[1]  Pradeep K. Khosla,et al.  A Multi-Agent System for Programming Robotic Agents by Human Demonstration , 1998 .

[2]  Terrence Fong,et al.  The human-robot interaction operating system , 2006, HRI '06.

[3]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[4]  Seiji Yamada,et al.  Real Robot Learning with Human Teaching , 2002 .

[5]  Paul E. Utgoff,et al.  On integrating apprentice learning and reinforcement learning , 1996 .

[6]  Léon J. M. Rothkrantz,et al.  Personality model for a companion AIBO , 2005, ACE '05.

[7]  Barry A. Bodt,et al.  A field experiment of autonomous mobility: Operator workload for one and two robots , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Andrew W. Moore,et al.  Variable Resolution Dynamic Programming , 1991, ML Workshop.

[9]  Rüdiger Dillmann,et al.  Teaching and learning of robot tasks via observation of human performance , 2004, Robotics Auton. Syst..

[10]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[11]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[12]  Masahiro Fujita,et al.  Evolving robust gaits with AIBO , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[13]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[14]  Chrystopher L. Nehaniv,et al.  Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[15]  Kerstin Dautenhahn,et al.  Mapping between dissim ilar bodies: Affordances and the algebraic foundations of imitation , 1998 .

[16]  John E. Laird,et al.  Learning procedural knowledge through observation , 2001, K-CAP '01.

[17]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[18]  A. Kerepesi,et al.  Behavioural comparison of human–animal (dog) and human–robot (AIBO) interactions , 2006, Behavioural Processes.

[19]  Stephan K. Chalup,et al.  Techniques for Improving Vision and Locomotion on the Sony AIBO Robot , 2003 .

[20]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[21]  Andrea Lockerd Thomaz,et al.  Tutelage and socially guided robot learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[22]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[23]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI and Robotics , 1997, RoboCup.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Gordon Cheng,et al.  Discovering optimal imitation strategies , 2004, Robotics Auton. Syst..

[26]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[27]  Ignazio Infantino,et al.  A posture sequence learning system for an anthropomorphic robotic hand , 2004, Robotics Auton. Syst..

[28]  Daniel H. Grollman,et al.  Sparse incremental learning for interactive robot control policy estimation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[29]  Maja J. Matarić,et al.  A framework for learning from demonstration, generalization and practice in human-robot domains , 2003 .

[30]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[31]  Manuela M. Veloso,et al.  CMRoboBits: Creating an Intelligent AIBO Robot , 2006, AI Mag..

[32]  Manuela M. Veloso,et al.  A Team of Humanoid Game Commentators , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[33]  Yiannis Demiris,et al.  Perceptual Perspective Taking and Action Recognition , 2005 .

[34]  Gillian M. Hayes,et al.  A Robot Controller Using Learning by Imitation , 1994 .

[35]  Manuela M. Veloso,et al.  An evolutionary approach to gait learning for four-legged robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[36]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[37]  Batya Friedman,et al.  Robots as dogs?: children's interactions with the robotic dog AIBO and a live australian shepherd , 2005, CHI Extended Abstracts.

[38]  Jeffrey D. Anderson,et al.  Managing autonomy in robot teams: Observations from four experiments , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[39]  Sonia Chernova,et al.  Mobile human-robot teaming with environmental tolerance , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[40]  Sonia Chernova,et al.  From Deliberative to Routine Behaviors: A Cognitively Inspired Action-Selection Mechanism for Routine Behavior Capture , 2007, Adapt. Behav..

[41]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[42]  T. Ishida Development of a small biped entertainment robot QRIO , 2004, Micro-Nanomechatronics and Human Science, 2004 and The Fourth Symposium Micro-Nanomechatronics for Information-Based Society, 2004..

[43]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[44]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[45]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[46]  Sascha Ossowski,et al.  On coordination and its significance to distributed and multi-agent systems: Research Articles , 2006 .

[47]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[48]  Tetsunari Inamura Masayuki Inaba Hirochika Acquisition of Probabilistic Behavior Decision Model based on the Interactive Teaching Method , 2001 .

[49]  M. Veloso,et al.  From CMDash ’ 05 to CMRoboBits : Transitioning Multi-Agent Research with AIBOs to the Classroom , 2005 .

[50]  Michael Lewis,et al.  Human control for cooperating robot teams , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[51]  Chrystopher L. Nehaniv,et al.  Synchrony and perception in robotic imitation across embodiments , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[52]  Luís Nunes,et al.  On Learning by Exchanging Advice , 2002, ArXiv.

[53]  Claude Sammut,et al.  A Description of the rUNSWift 2003 Legged Robot Soccer Team , 2003 .

[54]  Harini Veeraraghavan,et al.  Teaching Sequential Tasks with Repetition through Demonstration (Short Paper) , 2008 .

[55]  Manuela Veloso,et al.  Coaching: learning and using environment and agent models for advice , 2005 .

[56]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[57]  Maja J. Matarić,et al.  Principled Approaches to the Design of Multi-Robot Systems , 2004 .

[58]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[59]  Pradeep K. Khosla,et al.  Learning by observation with mobile robots: a computational approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[60]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[61]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[62]  Brett Browning,et al.  Skill Acquisition and Use for a Dynamically-Balancing Soccer Robot , 2004, AAAI.

[63]  Jessica K. Hodgins,et al.  Generalizing Demonstrated Manipulation Tasks , 2002, WAFR.

[64]  Manuela M. Veloso,et al.  Acquiring Observation Models Through Reverse Plan Monitoring , 2005, EPIA.

[65]  Andrew Garland,et al.  Learning Hierarchical Task Models By Demonstration , 2002 .

[66]  Alexander Zelinsky,et al.  Programing by Demonstration: Coping with Suboptimal Teaching Actions , 2003 .

[67]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[68]  Minoru Asada,et al.  A Hierarchical Multi-Module Learning System based on Self-Interpretation of Instructions by Coach , 2003 .

[69]  Andrea Lockerd Thomaz,et al.  Teaching and working with robots as a collaboration , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[70]  Geir Hovland,et al.  Skill acquisition from human demonstration using a hidden Markov model , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[71]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[72]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[73]  Manuela M. Veloso,et al.  Learning and using models of kicking motions for legged robots , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[74]  Maria L. Gini,et al.  Using the AIBOs in a CS1 Course , 2007, AAAI Spring Symposium: Semantic Scientific Knowledge Integration.

[75]  Takayuki Kanda,et al.  Simultaneous teleoperation of multiple social robots , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[76]  Manuela M. Veloso,et al.  Multi-thresholded approach to demonstration selection for interactive robot learning , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[77]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[78]  Minoru Asada,et al.  An Overview of RoboCup 2002 Fukuoka/Busan , 2003, RoboCup.

[79]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[80]  Julie A. Adams,et al.  Assessing the scalability of a multiple robot interface , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[81]  Stephen A. Billings,et al.  Robot programming by demonstration through system identification , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82]  Darrin C. Bentivegna,et al.  Learning From Observation and Practice Using Primitives , 2004 .

[83]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[84]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.