Bio-inspired Approach for the Recognition of Goal-Directed Hand Actions

The recognition of transitive, goal-directed actions requires a sensible balance between the representation of specific shape details of effector and goal object and robustness with respect to image transformations. We present a biologically-inspired architecture for the recognition of transitive actions from video sequences that integrates an appearance-based recognition approach with a simple neural mechanism for the representation of the effector-object relationship. A large degree of position invariance is obtained by nonlinear pooling in combination with an explicit representation of the relative positions of object and effector using neural population codes. The approach was tested on real videos, demonstrating successful invariant recognition of grip types on unsegmented video sequences. In addition, the algorithm reproduces and predicts the behavior of action-selective neurons in parietal and prefrontal cortex.

[1]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[2]  Michael A. Arbib,et al.  Mirror neurons and imitation: A computationally guided review , 2006, Neural Networks.

[3]  D I Perrett,et al.  Frameworks of analysis for the neural representation of animate objects and actions. , 1989, The Journal of experimental biology.

[4]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Pierre Kornprobst,et al.  Action Recognition Using a Bio-Inspired Feedforward Spiking Network , 2009, International Journal of Computer Vision.

[7]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Sanja Fidler,et al.  Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[11]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[12]  Roberto Prevete,et al.  A connectionist architecture for view-independent grip-aperture computation , 2008, Brain Research.

[13]  G. Rizzolatti,et al.  Action recognition in the premotor cortex. , 1996, Brain : a journal of neurology.

[14]  K. Zhang,et al.  Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[15]  M. Irani,et al.  Event-Based Video Analysis, , 2001 .

[16]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[17]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[19]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[20]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.