Combining Densely Sampled Form and Motion for Human Action Recognition

We present a method for human action recognition from video, which exploits both form (local shape) and motion (local flow). Inspired by models of the human visual system, the two feature sets are processed independently in separate channels. The form channel extracts a dense local shape representation from every frame, while the motion channel extracts dense optic flow from the frame and its immediate predecessor. The same processing pipeline is applied in both channels: feature maps are pooled locally, down-sampled, and compared to a collection of learnt templates, yielding a vector of similarity scores. In a final step, the two score vectors are merged, and recognition is performed with a discriminative classifier. In an evaluation on two standard datasets our method outperforms the state-of-the-art, confirming that the combination of form and motion improves recognition.

[1]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  T. Gawne,et al.  Responses of primate visual cortical neurons to stimuli presented by flash, saccade, blink, and external darkening. , 2002, Journal of neurophysiology.

[4]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[5]  T. Gawne,et al.  Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. , 2002, Journal of neurophysiology.

[6]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Antonino Casile,et al.  Critical features for the recognition of biological motion. , 2005, Journal of vision.

[9]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[10]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[11]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[13]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[14]  J. Sullivan,et al.  Action Recognition by Shape Matching to Key Frames , 2002 .

[15]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[16]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[17]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Fei-FeiLi,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008 .

[19]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[20]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[21]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[23]  Tomaso Poggio,et al.  Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. , 2004, Journal of neurophysiology.

[24]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[28]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[29]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[30]  J A Beintema,et al.  Perception of biological motion without local image motion , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[32]  PoggioTomaso,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007 .