Action recognition and understanding through motor primitives

In robotics, recognition of human activity has been used extensively for robot task learning through imitation and demonstration. However, there has not been much work performed on modeling and recognition of activities that involve object manipulation and grasping. In this work, we deal with single arm/hand actions which are very similar to each other in terms of arm/hand motions. The approach is based on the hypothesis that actions can be represented as sequences of motion primitives. Given this, a set of five different manipulation actions of different levels of complexity are investigated. To model the process, we use a combination of discriminative support vector machines and generative hidden Markov models. The experimental evaluation, performed with 10 people, investigates both the definition and structure of primitive motions, as well as the validity of the modeling approach taken.

[1]  Darren Newtson,et al.  The objective basis of behavior units. , 1977 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[4]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Steven E. Golowich,et al.  A Support Vector/Hidden Markov Model Approach to Phoneme Recognition , 1998 .

[6]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[8]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[9]  Katsushi Ikeuchi,et al.  Recognition of human task by attention point analysis , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[10]  G. Rizzolatti,et al.  Visuomotor neurons: ambiguity of the discharge or 'motor' perception? , 2000, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[11]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Katsushi Ikeuchi,et al.  Acquiring hand-action models by attention point analysis , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[13]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[14]  Li Liao,et al.  Combining pairwise sequence similarity and support vector machines for remote protein homology detection , 2002, RECOMB '02.

[15]  Katsushi Ikeuchi,et al.  Modeling manipulation interactions by hidden Markov models , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Gwen Littlewort,et al.  A Prototype for Automatic Recognition of Spontaneous Facial Actions , 2002, NIPS.

[17]  Aude Billard,et al.  Imitation : a review , 2002 .

[18]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[19]  Pietro Perona,et al.  Decomposition of human motion into dynamics-based primitives with application to drawing tasks , 2003, Autom..

[20]  José Santos-Victor,et al.  Visual transformations in gesture imitation: what you see is what you do , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[21]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[22]  Nicola J. Ferrier,et al.  Repetitive motion analysis: segmentation and event classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[24]  J. Mazziotta,et al.  Grasping the Intentions of Others with One's Own Mirror Neuron System , 2005, PLoS biology.

[25]  Gina-Anne Levow,et al.  Dialog act tagging with support vector machines and hidden Markov models , 2006, INTERSPEECH.

[26]  Michael A. Arbib,et al.  Mirror neurons and imitation: A computationally guided review , 2006, Neural Networks.

[27]  Taisuke Sato,et al.  Bayesian classification of task-oriented actions based on stochastic context-free grammar , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[28]  Aude Billard,et al.  Discriminative and adaptive imitation in uni-manual and bi-manual tasks , 2006, Robotics Auton. Syst..