论文信息 - A Probabilistic Framework for Model-Based Imitation Learning

A Probabilistic Framework for Model-Based Imitation Learning

A Probabilistic Framework for Model-Based Imitation Learning Aaron P. Shon, David B. Grimes, Chris L. Baker, and Rajesh P.N. Rao {aaron, grimes, clbaker, rao}@cs.washington.edu CSE Department, Box 352350 University of Washington Seattle WA 98195 USA frameworks have been proposed for imitation learn- ing in machines [Breazeal, 1999, Scassellati, 1999, Billard and Mataric, 2000], but most of these are not de- signed around a coherent probabilistic formalism such as Bayesian inference. Probabilistic methods, and Bayesian inference in particular, are attractive because they han- dle noisy, incomplete data, can be tuned to handle realis- tically large problem sizes, and provide a unifying math- ematical framework for reasoning and learning. Our ap- proach is unique in combining a biologically inspired ap- proach to imitation with a Bayesian framework for goal- directed learning. Unlike many imitation systems, which implement only software simulations, this paper demon- strates the value of our framework through both simula- tion results and a real-time robotic implementation. Abstract Humans and animals use imitation as a mechanism for acquiring knowledge. Recently, several algorithms and models have been proposed for imitation learning in robots and humans. However, few proposals o er a framework for imitation learning in a stochastic environ- ment where the imitator must learn and act under real- time performance constraints. We present a probabilis- tic framework for imitation learning in stochastic envi- ronments with unreliable sensors. We develop Bayesian algorithms, based on Meltzo and Moore’s AIM hypoth- esis for infant imitation, that implement the core of an imitation learning framework, and sketch basic propos- als for the other components. Our algorithms are com- putationally efficient, allowing real-time learning and imitation in an active stereo vision robotic head. We present results of both software simulations and our al- gorithms running on the head, demonstrating the valid- ity of our approach. Components of an imitation learning system Imitation learning in animals and machines Imitation is a common mechanism for transfer- ring knowledge from a skilled agent (the instruc- tor ) to an unskilled agent (or observer ) using direct demonstration rather than manipulating symbols. Various forms of imitation have been studied in apes [Visalberghy and Fragaszy, 1990, Byrne and Russon, 2003], in children (in- cluding infants only minutes old) [Meltzoff and Moore, 1977, Meltzoff and Moore, 1997], and in an increasingly diverse selection of machines [Fong et al., 2002, Lungarella and Metta, 2003]. The attraction for machine learning is obvious: a machine with the ability to imitate has a drastically lower cost of reprogramming than one which requires programming by an expert. Imitative robots also offer testbeds for cognitive researchers to test computational theories, and provide modifiable agents for contingent interaction with humans in psychological experiments. Few previous efforts have presented biologically plau- sible frameworks for imitation learning. Bayesian imitation learning has been proposed to accelerate Markov decision process (MDP) learning for reinforce- ment learning agents [Price, 2003]; however, this frame- work chiefly addresses the problem of learning a forward model of the environment [Jordan and Rumelhart, 1992] via imitation (see below), and the correspondence with cognitive findings in humans is unclear. Other The observer must surmount a number of problems in attempting to replicate the behavior of the instruc- tor. Although described elsewhere [Schaal et al., 2003, Rao and Meltzoff, 2003], we briefly reformulate them as follows: 1. State identification: Ability to classify high- dimensional sensor data into a lower-dimensional, rel- evant state robust to sensor noise. State identifica- tion should differentiate between the internal state of the observer (proprioceptive feedback, etc.) and the state of the environment, including the states of other agents, particularly the instructor. 2. Action identification: Ability to classify sequences of states in time. 3. State mapping: Transformation from the egocentric coordinate system of the instructor to the egocentric coordinate system of the observer. 4. Model learning: Learning forward and inverse mod- els [Blakemore et al., 1998] to facilitate interaction with the environment. 5. Policy learning: Learning action choices that maxi- mize a reward function, as observed from the actions selected by the instructor in each given state. 6. Sequence learning and segmentation: Ability to memorize sequences of key states needed to complete

Rajesh P. N. Rao | David B. Grimes | Aaron P. Shon | Chris L. Baker | A. P. Shon

[1] Illah R. Nourbakhsh,et al. A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[2] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[3] Shimon Ullman. The Correspondence Problem , 1979 .

[4] R J HERRNSTEIN,et al. Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[5] W. Prinz,et al. The imitative mind : development, evolution, and brain bases , 2002 .

[6] A. Meltzoff,et al. The importance of eyes: how infants interpret adult looking behavior. , 2002, Developmental psychology.

[7] Sonia Ragir,et al. “Language” and intelligence in monkeys and apes. Comparative developmental perspectives , 1991, International Journal of Primatology.

[8] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9] Joshua B. Tenenbaum,et al. Bayesian Modeling of Human Concept Learning , 1998, NIPS.

[10] K. Dautenhahn,et al. The correspondence problem , 2002 .

[11] Zoubin Ghahramani,et al. Unsupervised learning of sensory-motor primitives , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[12] A. Meltzoff,et al. Explaining Facial Imitation: A Theoretical Model. , 1997, Early development & parenting.

[13] M. Lungarella,et al. Beyond Gazing, Pointing, and Reaching: A Survey of Developmental Robotics , 2003 .

[14] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[15] A. Meltzoff,et al. Imitation of Facial and Manual Gestures by Human Neonates , 1977, Science.

[16] Stefan Schaal,et al. http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[17] Aude Billard,et al. A biologically inspired robotic model for learning by imitation , 2000, AGENTS '00.

[18] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19] K. Gibson,et al. “Language” and intelligence in monkeys and apes: Comparative developmental perspectives on ape “language” , 1990 .

[20] A. Meltzoff. Elements of a developmental theory of imitation , 2002 .

[21] Dieter Fox,et al. Bayesian Filtering for Location Estimation , 2003, IEEE Pervasive Comput..

[22] R. Byrne,et al. Priming primates: Human and otherwise , 1998, Behavioral and Brain Sciences.

[23] D M Wolpert,et al. Predicting the Consequences of Our Own Actions: The Role of Sensorimotor Context Estimation , 1998, The Journal of Neuroscience.

[24] B. Scassellati. Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[25] Ying Wu,et al. Wide-range, person- and illumination-insensitive head orientation estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[26] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.