Computational Model of Primary Visual Cortex Combining Visual Attention for Action Recognition

Humans can easily understand other people’s actions through visual systems, while computers cannot. Therefore, a new bio-inspired computational model is proposed in this paper aiming for automatic action recognition. The model focuses on dynamic properties of neurons and neural networks in the primary visual cortex (V1), and simulates the procedure of information processing in V1, which consists of visual perception, visual attention and representation of human action. In our model, a family of the three-dimensional spatial-temporal correlative Gabor filters is used to model the dynamic properties of the classical receptive field of V1 simple cell tuned to different speeds and orientations in time for detection of spatiotemporal information from video sequences. Based on the inhibitory effect of stimuli outside the classical receptive field caused by lateral connections of spiking neuron networks in V1, we propose surround suppressive operator to further process spatiotemporal information. Visual attention model based on perceptual grouping is integrated into our model to filter and group different regions. Moreover, in order to represent the human action, we consider the characteristic of the neural code: mean motion map based on analysis of spike trains generated by spiking neurons. The experimental evaluation on some publicly available action datasets and comparison with the state-of-the-art approaches demonstrate the superior performance of the proposed model.

[1]  A. Destexhe,et al.  The high-conductance state of neocortical neurons in vivo , 2003, Nature Reviews Neuroscience.

[2]  H. Jones,et al.  Context-dependent interactions and visual processing in V1 , 1996, Journal of Physiology-Paris.

[3]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[4]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[5]  Haibin Ling,et al.  3D R Transform on Spatio-temporal Interest Points for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  G. Stanley Reading and writing the neural code , 2013, Nature Neuroscience.

[8]  Heiko Neumann,et al.  A Fast Biologically Inspired Algorithm for Recurrent Motion Estimation , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yang Wang,et al.  Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nicolai Petkov,et al.  Suppression of contour perception by band-limited noise and its relation to nonclassical receptive field inhibition , 2003, Biological cybernetics.

[11]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[12]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[13]  Bing Zeng,et al.  Optimization of fast block motion estimation algorithms , 1997, IEEE Trans. Circuits Syst. Video Technol..

[14]  M.-J. Escobar,et al.  Biological Motion Recognition Using a MT-like Model , 2006, 2006 IEEE 3rd Latin American Robotics Symposium.

[15]  Chao-Yi Li,et al.  Field of Attention for Instantaneous Object Recognition , 2011, PloS one.

[16]  P. Kornprobst,et al.  Could early visual processes be sufficient to label motions? , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[17]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[18]  D. Bradley,et al.  Structure and function of visual area MT. , 2005, Annual review of neuroscience.

[19]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[20]  H. Isil Bozma,et al.  Attentional sequence-based recognition: Markovian and evidential reasoning , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Richard Souvenir,et al.  Learning the viewpoint manifold for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[23]  A. Sillito,et al.  Surround suppression in primate V1. , 2001, Journal of neurophysiology.

[24]  C. Gross Brain Mechanisms of Perception and Memory: From Neuron to Behavior.Taketoshi Ono , Larry R. Squire , Marcus E. Raichle , David I. Perrett , Masaji Fukuda , 1995 .

[25]  Pierre Kornprobst,et al.  Action Recognition with a Bio-inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions , 2008, ECCV.

[26]  Randolph Blake,et al.  Perceptual consequences of centre–surround antagonism in visual motion processing , 2003, Nature.

[27]  A L Pearlman,et al.  Laminar distribution of receptive field properties in the primary visual cortex of the mouse , 1980, The Journal of comparative neurology.

[28]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Eugene M. Izhikevich,et al.  Which model to use for cortical spiking neurons? , 2004, IEEE Transactions on Neural Networks.

[31]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  R. Blake,et al.  Perception of human motion. , 2007, Annual review of psychology.

[35]  Barbara T. Sweet,et al.  Visual cues for closed-loop control , 2010 .

[36]  Christopher C. Pack,et al.  Contrast dependence of suppressive influences in cortical area MT of alert macaque. , 2005, Journal of neurophysiology.

[37]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[39]  Tianxu Zhang,et al.  Extraction of salient contours from cluttered scenes , 2007, Pattern Recognit..

[40]  I. Ohzawa,et al.  Receptive-field dynamics in the central visual pathways , 1995, Trends in Neurosciences.

[41]  Margaret S Livingstone,et al.  End-Stopping and the Aperture Problem Two-Dimensional Motion Signals in Macaque V1 , 2003, Neuron.

[42]  Heiko Neumann,et al.  Disambiguating Visual Motion by Form-Motion Interaction—a Computational Model , 2007, International Journal of Computer Vision.

[43]  Martin A. Giese,et al.  Biophysiologically Plausible Implementations of the Maximum Operation , 2002, Neural Computation.

[44]  Andrew Gilbert,et al.  Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  J. Elder,et al.  Ecological statistics of Gestalt laws for the perceptual organization of contours. , 2002, Journal of vision.

[47]  N. Petkov,et al.  Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition , 2007, Biological Cybernetics.

[48]  Dirk B. Walther,et al.  Attentional Selection for Object Recognition – a Gentle , 2002 .

[49]  Donald P. Greenberg,et al.  Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments , 2001, TOGS.

[50]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[51]  Thomas B. Moeslund,et al.  Selective spatio-temporal interest points , 2012, Comput. Vis. Image Underst..

[52]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[54]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[55]  J. Nelson,et al.  Intracortical facilitation among co-oriented, co-axially aligned simple cells in cat striate cortex , 2004, Experimental Brain Research.

[56]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Pierre Kornprobst,et al.  Action Recognition Using a Bio-Inspired Feedforward Spiking Network , 2009, International Journal of Computer Vision.

[58]  S. Grossberg,et al.  Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception. , 2007, Spatial vision.

[59]  Antonino Casile,et al.  Critical features for the recognition of biological motion. , 2005, Journal of vision.

[60]  S. Gong,et al.  Recognising action as clouds of space-time interest points , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  A J Ahumada,et al.  Model of human visual-motion sensing. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[62]  Pierre Kornprobst,et al.  Action recognition via bio-inspired features: The richness of center-surround interaction , 2012, Comput. Vis. Image Underst..

[63]  O. Mimura [Eye movements]. , 1992, Nippon Ganka Gakkai zasshi.

[64]  Nicolai Petkov,et al.  Computational model of dot-pattern selective cells , 2000, Biological Cybernetics.

[65]  K. Obermayer,et al.  The Role of Feedback in Shaping the Extra-Classical Receptive Field of Cortical Neurons: A Recurrent Network Model , 2006, The Journal of Neuroscience.

[66]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Michael Shelley,et al.  How Simple Cells Are Made in a Nonlinear Network Model of the Visual Cortex , 2001, The Journal of Neuroscience.

[68]  D. Ferster,et al.  Computational Diversity in Complex Cells of Cat Primary Visual Cortex , 2007, The Journal of Neuroscience.

[69]  Ying Wang,et al.  Human Activity Recognition Based on R Transform , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.