Visual learning by imitation with motor representations

We propose a general architecture for action (mimicking) and program (gesture) level visual imitation. Action-level imitation involves two modules. The viewpoint transformation (VPT) performs a "rotation" to align the demonstrator's body to that of the learner. The visuo-motor map (VMM) maps this visual information to motor data. For program-level (gesture) imitation, there is an additional module that allows the system to recognize and generate its own interpretation of observed gestures to produce similar gestures/goals at a later stage. Besides the holistic approach to the problem, our approach differs from traditional work in i) the use of motor information for gesture recognition; ii) usage of context (e.g., object affordances) to focus the attention of the recognition system and reduce ambiguities, and iii) use iconic image representations for the hand, as opposed to fitting kinematic models to the video sequence. This approach is motivated by the finding of visuomotor neurons in the F5 area of the macaque brain that suggest that gesture recognition/imitation is performed in motor terms (mirror) and rely on the use of object affordances (canonical) to handle ambiguous actions. Our results show that this approach can outperform more conventional (e.g., pure visual) methods.

[1]  Jennie Hall,et al.  DRAMA , 1912, Francis W. Parker School Yearbook.

[2]  H. F. Brandt The psychology of seeing. , 1947, The Columbia optometrist.

[3]  R. Gregory Eye and Brain: The Psychology of Seeing , 1966 .

[4]  J. Bruner Nature and uses of immaturity. , 1972 .

[5]  H. Ridley Eye and Brain , 1973 .

[6]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[7]  V. G. Payne,et al.  Human Motor Development: A Lifespan Approach , 1987 .

[8]  Yangsheng Xu,et al.  Hidden Markov model approach to skill learning and its application to telerobotics , 1993, IEEE Trans. Robotics Autom..

[9]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[10]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  G. Rizzolatti,et al.  Object representation in the ventral premotor cortex (area F5) of the monkey. , 1997, Journal of neurophysiology.

[12]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[13]  Ying Wu,et al.  Capturing articulated human hand motion: a divide-and-conquer approach , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Aude Billard,et al.  DRAMA, a Connectionist Architecture for Control and Learning in Autonomous Robots , 1999, Adapt. Behav..

[15]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[16]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[17]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  G. Rizzolatti,et al.  Visuomotor neurons: ambiguity of the discharge or 'motor' perception? , 2000, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[19]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[20]  B. Carpentieri,et al.  Lossless image coding via adaptive linear prediction and classification , 2000, Proceedings of the IEEE.

[21]  Mark H. Lee,et al.  Teaching from examples in assembly and manipulation of snack food ingredients by robot , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[22]  Giulio Sandini,et al.  Development: Is it the right way towards humanoid robotics? , 2001 .

[23]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[24]  Minoru Asada,et al.  Cognitive developmental robotics as a new paradigm for the design of humanoid robots , 2001, Robotics Auton. Syst..

[25]  Giulio Sandini,et al.  Sensorimotor Interaction in a Developing Robot , 2001 .

[26]  G. Rizzolatti,et al.  Cortical mechanism for the visual guidance of hand grasping movements in the monkey: A reversible inactivation study. , 2001, Brain : a journal of neurology.

[27]  W. Prinz,et al.  Ego function of early imitation , 2002 .

[28]  M. Arbib,et al.  Modeling the mirror: grasp learning and action recognition , 2002 .

[29]  Maja J. Matarić,et al.  Sensory-motor primitives as a basis for imitation: linking perception to action and biology to robotics , 2002 .

[30]  Chrystopher L. Nehaniv,et al.  Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics , 2002 .

[31]  José Santos-Victor,et al.  Visual transformations in gesture imitation: what you see is what you do , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[32]  M. Asada,et al.  Learning by Observation without Three-Dimensional Reconstruction , 2004 .

[33]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.