Computational Model Based on Neural Network of Visual Cortex for Human Action Recognition

In this paper, we propose a bioinspired model for human action recognition through modeling neural mechanisms of information processing in two visual cortical areas: the primary visual cortex (V1) and the middle temporal cortex (MT) dedicated to motion. This model, named V1-MT, is composed of V1 and MT models (layers) corresponding to their cortical areas, which are built with layered spiking neural networks (SNNs). Some neuron properties in V1 and MT, such as direction and speed selectivity, spatiotemporal inseparability, and center surround suppression, are integrated into SNNs. Based on speed and direction selectivity, V1 and MT models contain multiple SNN channels, each of which processes motion information in sequences with spatiotemporal tunings of neurons at a certain speed and different directions. Therefore, we propose two operations, input signal perceiving with 3-D Gabor filters and surround inhibition processing with 3-D differences of Gaussian functions, to perform this task according to the spatiotemporal inseparability and center surround suppression of neurons. Then, neurons are modeled with our simplified integrate-and-fire model and motion information is transformed into spike trains. Afterward, we define a new feature vector: a mean motion map computed from spike trains in all channels to represent human actions. Finally, a support vector machine is trained to classify actions represented by the feature vectors. We conducted extensive experiments on public action databases, and the results show that our model outperforms other bioinspired models and rivals the state-of-the-art approaches.

[1]  A. Sillito,et al.  Surround suppression in primate V1. , 2001, Journal of neurophysiology.

[2]  Pierre Kornprobst,et al.  Action Recognition with a Bio-inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions , 2008, ECCV.

[3]  Lynne E. Parker,et al.  Simplex-Based 3D Spatio-temporal Feature Description for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[5]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[6]  M. Carandini,et al.  Mapping of stimulus energy in primary visual cortex. , 2005, Journal of neurophysiology.

[7]  Qingxiang Wu,et al.  Wavelet transform and texture recognition based on spiking neural network for visual images , 2015, Neurocomputing.

[8]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[9]  I. Ohzawa,et al.  Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development. , 1993, Journal of neurophysiology.

[10]  Theo Gevers,et al.  Evaluation of Color STIPs for Human Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Simon J. Thorpe,et al.  Spike arrival times: A highly efficient coding scheme for neural networks , 1990 .

[12]  J. Barry Richmond,et al.  Neural Coding , 2014, Encyclopedia of Computational Neuroscience.

[13]  Lin Sun,et al.  DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  L. Palmer,et al.  Contribution of linear spatiotemporal receptive field structure to velocity selectivity of simple cells in area 17 of cat , 1989, Vision Research.

[15]  A. Destexhe,et al.  The high-conductance state of neocortical neurons in vivo , 2003, Nature Reviews Neuroscience.

[16]  Michael Shelley,et al.  How Simple Cells Are Made in a Nonlinear Network Model of the Visual Cortex , 2001, The Journal of Neuroscience.

[17]  A. Hyvärinen,et al.  A multi-layer sparse coding network learns contour coding from natural images , 2002, Vision Research.

[18]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Lei Zhang,et al.  Spatio-temporal SIFT and Its Application to Human Action Classification , 2012, ECCV Workshops.

[20]  N. Petkov,et al.  Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition , 2007, Biological Cybernetics.

[21]  Martin A. Giese,et al.  Learning Features of Intermediate Complexity for the Recognition of Biological Motion , 2005, ICANN.

[22]  Zhiyong Gao,et al.  A biologically-inspired model for dynamic saliency detection , 2014, 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI).

[23]  Chaoyi Li,et al.  Spatiotemporal organization of simple-cell receptive fields in area 18 of cat’s cortex , 1998, Science in China Series C: Life Sciences.

[24]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Pierre Kornprobst,et al.  Action recognition via bio-inspired features: The richness of center-surround interaction , 2012, Comput. Vis. Image Underst..

[26]  Nicolai Petkov,et al.  Comparison of texture features based on Gabor filters , 2002, IEEE Trans. Image Process..

[27]  Pierre Kornprobst,et al.  Action Recognition Using a Bio-Inspired Feedforward Spiking Network , 2009, International Journal of Computer Vision.

[28]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  R. Blake,et al.  Perception of human motion. , 2007, Annual review of psychology.

[30]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Juan López Coronado,et al.  Neuro-Inspired Spike-Based Motion: From Dynamic Vision Sensor to Robot Motor Open-Loop Control through Spike-VITE , 2013, Sensors.

[32]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[33]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[34]  D. W. Wheeler,et al.  Brightness Induction: Rate Enhancement and Neuronal Synchronization as Complementary Codes , 2006, Neuron.

[35]  Feng Shi,et al.  Sampling Strategies for Real-Time Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Nikola K. Kasabov,et al.  NeuCube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data , 2014, Neural Networks.

[37]  Mauro Ursino,et al.  Direction selectivity of simple cells in the primary visual cortex: Comparison of two alternative mathematical models. II: Velocity tuning and response to moving bars , 2007, Comput. Biol. Medicine.

[38]  Noel E. O'Connor,et al.  Action recognition based on sparse motion trajectories , 2013, 2013 IEEE International Conference on Image Processing.

[39]  Eugene M. Izhikevich,et al.  Which model to use for cortical spiking neurons? , 2004, IEEE Transactions on Neural Networks.

[40]  Stephen J. Maybank,et al.  Learning Human Actions by Combining Global Dynamics and Local Appearance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[43]  Alexandros Iosifidis,et al.  View-Invariant Action Recognition Based on Artificial Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[44]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[45]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[46]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[47]  Shigeru Tanaka,et al.  Spatial pooling in the second-order spatial structure of cortical complex cells , 2000, Vision Research.

[48]  Heiko Neumann,et al.  A Fast Biologically Inspired Algorithm for Recurrent Motion Estimation , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  G. Orban,et al.  Shape and Spatial Distribution of Receptive Fields and Antagonistic Motion Surrounds in the Middle Temporal Area (V5) of the Macaque , 1995, The European journal of neuroscience.

[50]  Raja Parasuraman,et al.  Object-Based Attentional Modulation of Biological Motion Processing: Spatiotemporal Dynamics Using Functional Magnetic Resonance Imaging and Electroencephalography , 2010, The Journal of Neuroscience.

[51]  Bing Zeng,et al.  Optimization of fast block motion estimation algorithms , 1997, IEEE Trans. Circuits Syst. Video Technol..

[52]  Nicolai Petkov,et al.  Computational model of dot-pattern selective cells , 2000, Biological Cybernetics.

[53]  R. Mansfield,et al.  Analysis of visual behavior , 1982 .

[54]  Antonino Casile,et al.  Critical features for the recognition of biological motion. , 2005, Journal of vision.

[55]  George Azzopardi,et al.  A CORF computational model of a simple cell that relies on LGN input outperforms the Gabor function model , 2012, Biological Cybernetics.