Enhanced HMAX model with feedforward feature learning for multiclass categorization

In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX) is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT) layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100–150 ms of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: (1) To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; (2) To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; (3) Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Ali Borji,et al.  Invariance analysis of modified C2 features: case study—handwritten digit recognition , 2009, Machine Vision and Applications.

[3]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[4]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[5]  E. Callaway,et al.  Parallel colour-opponent pathways to primary visual cortex , 2003, Nature.

[6]  Arnold W. M. Smeulders,et al.  Visual dictionaries as intermediate features in the human brain , 2015, Front. Comput. Neurosci..

[7]  Matthieu Cord,et al.  Extended Coding and Pooling in the HMAX Model , 2013, IEEE Transactions on Image Processing.

[8]  J. Theeuwes Top-down and bottom-up control of visual selection. , 2010, Acta psychologica.

[9]  S. Kastner,et al.  Stimulus context modulates competition in human extrastriate cortex , 2005, Nature Neuroscience.

[10]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[11]  F. Qiu,et al.  Figure and Ground in the Visual Cortex: V2 Combines Stereoscopic Cues with Gestalt Rules , 2005, Neuron.

[12]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[13]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[14]  J. Lund,et al.  Intrinsic laminar lattice connections in primate visual cortex , 1983, The Journal of comparative neurology.

[15]  Robert W. G. Hunt,et al.  The reproduction of colour , 1957 .

[16]  L Weiskrantz,et al.  Color contrast processing in human striate cortex , 2007, Proceedings of the National Academy of Sciences.

[17]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[18]  M. Goodale,et al.  Two visual systems re-viewed , 2008, Neuropsychologia.

[19]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[21]  W. Merigan,et al.  Basic visual capacities and shape discrimination after lesions of extrastriate area V4 in macaques , 1996, Visual Neuroscience.

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Xuelong Li,et al.  Enhanced Biologically Inspired Model for Object Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Leslie G. Ungerleider,et al.  Microsaccadic eye movements and firing of single cells in the striate cortex of macaque monkeys , 2000, Nature Neuroscience.

[26]  T. Tuytelaars,et al.  Speeded-UpRobustFeatures(SURF) , 2008 .

[27]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[28]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Paul Wright,et al.  Objects and Categories: Feature Statistics and Object Processing in the Ventral Stream , 2013, Journal of Cognitive Neuroscience.

[30]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  J. D. Mollon,et al.  The comparison of spatially separated colours , 2006, Vision Research.

[32]  N. Kanwisher,et al.  Location and spatial profile of category‐specific regions in human extrastriate cortex , 2006, Human brain mapping.

[33]  Fengfu Li,et al.  Biologically Inspired Visual Model With Preliminary Cognition and Active Attention Adjustment , 2015, IEEE Transactions on Cybernetics.

[34]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[35]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[36]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Wieske van Zoest,et al.  Effects of Salience Are Short-Lived , 2008, Psychological science.

[38]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[39]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[40]  Jonathon Shlens,et al.  Spatial Properties and Functional Organization of Small Bistratified Ganglion Cells in Primate Retina , 2007, The Journal of Neuroscience.

[41]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[42]  Sinan Kalkan,et al.  Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[44]  H. Müller,et al.  Stimulus Saliency Modulates Pre-Attentive Processing Speed in Human Visual Cortex , 2011, PloS one.

[45]  E. Callaway,et al.  Parallel processing strategies of the primate visual system , 2009, Nature Reviews Neuroscience.

[46]  Bevil R. Conway,et al.  Color Vision, Cones, and Color-Coding in the Cortex , 2009, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[47]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[48]  D. Hubel,et al.  Spatial and chromatic interactions in the lateral geniculate body of the rhesus monkey. , 1966, Journal of neurophysiology.

[49]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[50]  H. Conklin,et al.  Color Categorization: Basic Color Terms: Their Universality and Evolution . Brent Berlin, Paul Kay. , 1973 .

[51]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[53]  C. Gross Single neuron studies of inferior temporal cortex , 2008, Neuropsychologia.

[54]  Á. Pascual-Leone,et al.  Fast Backprojections from the Motion to the Primary Visual Area Necessary for Visual Awareness , 2001, Science.

[55]  M. López-Aranda,et al.  Role of Layer 6 of V2 Visual Cortex in Object-Recognition Memory , 2009, Science.

[56]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Zhaoping Li,et al.  Neural Activities in V1 Create a Bottom-Up Saliency Map , 2012, Neuron.

[58]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  T. Wiesel,et al.  Clustered intrinsic connections in cat visual cortex , 1983, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[60]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[61]  P. Kay,et al.  Basic Color Terms: Their Universality and Evolution , 1973 .

[62]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[63]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[64]  H. Komatsu,et al.  Relationship between color discrimination and neural responses in the inferior temporal cortex of the monkey. , 2008, Journal of neurophysiology.

[65]  Hong Qiao,et al.  Introducing Memory and Association Mechanism Into a Biologically Inspired Visual Model , 2014, IEEE Transactions on Cybernetics.

[66]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[67]  O. D. Creutzfeldt,et al.  A quantitative study of chromatic organisation and receptive fields of cells in the lateral geniculate body of the rhesus monkey , 1979, Experimental Brain Research.

[68]  Hermann J. Müller,et al.  Perceptual Basis of Redundancy Gains in Visual Pop-out Search , 2011, Journal of Cognitive Neuroscience.