Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner

We present an active learning architecture that allows a robot to actively learn which data collection strategy is most efficient for acquiring motor skills to achieve multiple outcomes, and generalise over its experience to achieve new outcomes. The robot explores its environment both via interactive learning and goal-babbling. It learns at the same time when, who and what to actively imitate from several available teachers, and learns when not to use social guidance but use active goal-oriented self-exploration. This is formalised in the framework of life-long strategic learning.The proposed architecture, called Socially Guided Intrinsic Motivation with Active Choice of Teacher and Strategy (SGIM-ACTS), relies on hierarchical active decisions of what and how to learn driven by empirical evaluation of learning progress for each learning strategy. We illustrate with an experiment where a simulated robot learns to control its arm for realising two kinds of different outcomes. It has to choose actively and hierarchically at each learning episode: 1) what to learn: which outcome is most interesting to select as a goal to focus on for goal-directed exploration; 2) how to learn: which data collection strategy to use among self-exploration, mimicry and emulation; 3) once he has decided when and what to imitate by choosing mimicry or emulation, then he has to choose who to imitate, from a set of different teachers. We show that SGIM-ACTS learns significantly more effciently than using single learning strategies, and coherently selects the best strategy with respect to the chosen outcome, taking advantage of the available teachers (with different levels of skills).

[1]  C. Breazeal,et al.  Experiments in socially guided exploration: lessons learned in building robots that learn with and without human teachers , 2008, Connect. Sci..

[2]  Pierre-Yves Oudeyer,et al.  Properties for efficient demonstrations to a socially guided intrinsically motivated learner , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[3]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[4]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  José Santos-Victor,et al.  Abstraction Levels for Robotic Imitation: Overview and Computational Approaches , 2010, From Motor Learning to Interaction Learning in Robots.

[6]  T. Belpraeme,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2006 .

[7]  Alex M. Andrew,et al.  Imitation in Animals and Artifacts , 2003 .

[8]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[9]  Andrea L. Thomaz,et al.  Socially guided machine learning , 2006 .

[10]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[11]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[12]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[13]  Jochen J. Steil,et al.  Goal Babbling: a New Concept for Early Sensorimotor Exploration , 2012 .

[14]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[15]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[16]  M. Carpenter,et al.  Three sources of information in social learning , 2002 .

[17]  Udo Hahn,et al.  Multi-Task Active Learning for Linguistic Annotations , 2008, ACL.

[18]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[19]  Andrew Whiten,et al.  Primate culture and social learning , 2000, Cogn. Sci..

[20]  Pierre-Yves Oudeyer,et al.  Bootstrapping intrinsically motivated learning with human demonstration , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[21]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Brett Browning,et al.  Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot , 2011, Robotics Auton. Syst..

[23]  M. Tomasello,et al.  Shared intentionality. , 2007, Developmental science.

[24]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[25]  Pierre-Yves Oudeyer,et al.  Simultaneous acquisition of task and feedback models , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[26]  K. Dautenhahn,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2009 .

[27]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[28]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[29]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[30]  Darwin G. Caldwell,et al.  Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input , 2011, Adv. Robotics.

[31]  Sethu Vijayakumar,et al.  Methods for Learning Control Policies from Variable-Constraint Demonstrations , 2010, From Motor Learning to Interaction Learning in Robots.

[32]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[33]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[34]  Daniel H. Grollman,et al.  Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Pierre-Yves Oudeyer,et al.  The strategic student approach for life-long exploration and learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[36]  G. Csibra Teleological and referential understanding of action in infancy. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[37]  Maya Cakmak,et al.  Effects of social exploration mechanisms on robot learning , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[38]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[39]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[40]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[41]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[42]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[43]  Aude Billard,et al.  Handbook of Robotics Chapter 59 : Robot Programming by Demonstration , 2007 .

[44]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Leila Takayama,et al.  Communication and knowledge sharing in human-robot interaction and learning from demonstration , 2010, Neural Networks.

[46]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[47]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[48]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[49]  C. Breazeal,et al.  Robots that imitate humans , 2002, Trends in Cognitive Sciences.