Robot Learning from Failed Demonstrations

Robot Learning from Demonstration (RLfD) seeks to enable lay users to encode desired robot behaviors as autonomous controllers. Current work uses a human’s demonstration of the target task to initialize the robot’s policy, and then improves its performance either through practice (with a known reward function), or additional human interaction. In this article, we focus on the initialization step and consider what can be learned when the humans do not provide successful examples. We develop probabilistic approaches that avoid reproducing observed failures while leveraging the variance across multiple attempts to drive exploration. Our experiments indicate that failure data do contain information that can be used to discover successful means to accomplish tasks. However, in higher dimensions, additional information from the user will most likely be necessary to enable efficient failure-based learning.

[1]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[2]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[3]  C. Breazeal,et al.  Experiments in socially guided exploration: lessons learned in building robots that learn with and without human teachers , 2008, Connect. Sci..

[4]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[5]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[6]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[7]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[8]  A. Meltzoff Understanding the Intentions of Others: Re-Enactment of Intended Acts by 18-Month-Old Children. , 1995, Developmental psychology.

[9]  Andrej Gams,et al.  On-line periodic movement and force-profile learning for adaptation to new surfaces , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[10]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[11]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[12]  Lei Xu,et al.  Investigation on Several Model Selection Criteria for Determining the Number of Cluster , 2004 .

[13]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[14]  Brian Williams,et al.  Motion learning in variable environments using probabilistic flow tubes , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[16]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[17]  Stephen C. Want,et al.  Learning from other people's mistakes: causal understanding in learning to use a tool. , 2001, Child development.

[18]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[19]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[21]  Tohgoroh Matsui Adapting to Subsequent Changes of Environment by Learning Policy Preconditions , 2002 .

[22]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  H. Sung Gaussian Mixture Regression and Classification , 2004 .

[24]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[25]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[26]  Wolfram Burgard,et al.  Robotics: Science and Systems XV , 2010 .

[27]  Aude Billard,et al.  Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Klas Kronander,et al.  Learning to control planar hitting motions in a minigolf-like task , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.