Confidence-based policy learning from demonstration using Gaussian mixture models

We contribute an approach for interactive policy learning through expert demonstration that allows an agent to actively request and effectively represent demonstration examples. In order to address the inherent uncertainty of human demonstration, we represent the policy as a set of Gaussian mixture models (GMMs), where each model, with multiple Gaussian components, corresponds to a single action. Incrementally received demonstration examples are used as training data for the GMM set. We then introduce our confident execution approach, which focuses learning on relevant parts of the domain by enabling the agent to identify the need for and request demonstrations for specific parts of the state space. The agent selects between demonstration and autonomous execution based on statistical analysis of the uncertainty of the learned Gaussian mixture set. As it achieves proficiency at its task and gains confidence in its actions, the agent operates with increasing autonomy, eliminating the need for unnecessary demonstrations of already acquired behavior, and reducing both the training time and the demonstration workload of the expert. We validate our approach with experiments in simulated and real robot domains.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Gillian M. Hayes,et al.  A Robot Controller Using Learning by Imitation , 1994 .

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[9]  Monica N. Nicolescu,et al.  Learning and interacting in human-robot domains , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[10]  Tetsunari Inamura Masayuki Inaba Hirochika Acquisition of Probabilistic Behavior Decision Model based on the Interactive Teaching Method , 2001 .

[11]  John E. Laird,et al.  Learning procedural knowledge through observation , 2001, K-CAP '01.

[12]  Milind Tambe,et al.  Towards Adjustable Autonomy for the Real World , 2002, J. Artif. Intell. Res..

[13]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[14]  Jeffrey M. Bradshaw,et al.  Human-Agent Teamwork and Adjustable Autonomy in Practice , 2003 .

[15]  Stefan Schaal,et al.  Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[16]  Gordon Cheng,et al.  Learning from Observation and from Practice Using Behavioral Primitives , 2003, ISRR.

[17]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[18]  Andrea Lockerd Thomaz,et al.  Teaching and working with robots as a collaboration , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[19]  Darrin C. Bentivegna,et al.  Learning From Observation and Practice Using Primitives , 2004 .

[20]  Gordon Cheng,et al.  Learning to Act from Observation and Practice , 2004, Int. J. Humanoid Robotics.

[21]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[22]  Andrea Lockerd Thomaz,et al.  Tutelage and socially guided robot learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[23]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[24]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[25]  Chrystopher L. Nehaniv,et al.  Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[26]  Raquel Ros,et al.  Acquiring a Robust Case Base for the Robot Soccer Domain , 2007, IJCAI.