Imitation learning with hierarchical actions

Imitation is a powerful mechanism for rapidly learning new skills through observation of a mentor. Developmental studies indicate that children often perform goal-based imitation rather than mimicking a mentor's actual action trajectories. Further, imitation, and human behavior in general, appear to be based on a hierarchy of actions, with higher-level actions composed of sequences of lower-level actions. In this paper, we propose a new model for goal-based imitation that exploits action hierarchies for fast learning of new skills. As in human imitation, learning relies only on sample trajectories of mentor states. Unlike apprenticeship or inverse reinforcement learning, the model does not require that mentor actions be given. We present results from a large-scale grid world task that is modeled after a puzzle box task used in developmental studies for investigating hierarchical imitation in children. We show that the proposed model rapidly learns to combine a given set of hierarchical actions to achieve the subgoals necessary to reach a desired goal state. Our results demonstrate that hierarchical imitation can yield significant speed-up in learning, especially in large state spaces, compared to learning without a mentor or without an action hierarchy.

[1]  Geoffrey J. Gordon,et al.  Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.

[2]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[3]  A. Meltzoff,et al.  Imitation in Newborn Infants: Exploring the Range of Gestures Imitated and the Underlying Mechanisms. , 1989, Developmental psychology.

[4]  H. Bekkering,et al.  Goal-directed imitation. , 2002 .

[5]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[6]  A. Meltzoff 1 Imitation and Other Minds: The "Like Me" Hypothesis , 2005 .

[7]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[9]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[10]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[11]  A. Meltzoff The 'like me' framework for recognizing and becoming an intentional agent. , 2007, Acta psychologica.

[12]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[13]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[14]  Rajesh P. N. Rao,et al.  Imitation Learning Using Graphical Models , 2007, ECML.

[15]  Rajesh P. N. Rao,et al.  Goal-Based Imitation as Probabilistic Inference over Graphical Models , 2005, NIPS.

[16]  Rajesh P. N. Rao,et al.  Learning to Walk through Imitation , 2007, IJCAI.

[17]  Wolfram Schultz,et al.  Reward , 2019, HR for Creative Companies.

[18]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[19]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[20]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[21]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[22]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[23]  A. Whiten,et al.  Imitation of hierarchical action structure by young children. , 2006, Developmental science.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[26]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Rajesh P. N. Rao,et al.  Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .