Extended Coding and Pooling in the HMAX Model

This paper presents an extension of the HMAX model, a neural network model for image classification. The HMAX model can be described as a four-level architecture, with the first level consisting of multiscale and multiorientation local filters. We introduce two main contributions to this model. First, we improve the way the local filters at the first level are integrated into more complex filters at the last level, providing a flexible description of object regions and combining local information of multiple scales and orientations. These new filters are discriminative and yet invariant, two key aspects of visual classification. We evaluate their discriminative power and their level of invariance to geometrical transformations on a synthetic image set. Second, we introduce a multiresolution spatial pooling. This pooling encodes both local and global spatial information to produce discriminative image signatures. Classification results are reported on three image data sets: Caltech101, Caltech256, and fifteen scenes. We show significant improvements over previous architectures using a similar framework.

[1]  Jan C. van Gemert,et al.  Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[4]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[5]  Lior Wolf,et al.  Perception Strategies in Hierarchical Vision Systems , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Lior Wolf,et al.  Image representations beyond histograms of gradients: The role of Gestalt descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  C. Connor,et al.  Shape representation in area V4: position-specific tuning for boundary conformation. , 2001, Journal of neurophysiology.

[8]  Edmund T. Rolls,et al.  Reduced receptive field size of inferior temporal cortex neurons and reduced effects of attention when objects are selected in natural scenes , 2010 .

[9]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[11]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Stefano Soatto,et al.  Proximity Distribution Kernels for Geometric Context in Category Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[15]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[16]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[18]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[19]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[20]  J. Robson,et al.  Application of fourier analysis to the visibility of gratings , 1968, The Journal of physiology.

[21]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[24]  Graham W. Taylor,et al.  Deconvolutional Networks for Feature Learning , 2010 .

[25]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Andrew P. Witkin,et al.  Scale-space filtering: A new approach to multi-scale description , 1984, ICASSP.

[27]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[28]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[29]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[30]  R A Young,et al.  The Gaussian derivative model for spatial vision: I. Retinal mechanisms. , 1988, Spatial vision.

[31]  Edmund T Rolls,et al.  The Receptive Fields of Inferior Temporal Cortex Neurons in Natural Scenes , 2003, The Journal of Neuroscience.

[32]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[33]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Matthieu Cord,et al.  BOSSA: Extended bow formalism for image classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[35]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[36]  D. Broadbent,et al.  Some experiments bearing on the hypothesis that the visual system analyses spatial patterns in independent bands of spatial frequency , 1975, Vision Research.

[37]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[38]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Matthieu Cord,et al.  HMAX-S: Deep scale representation for biologically inspired image categorization , 2011, 2011 18th IEEE International Conference on Image Processing.

[40]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[41]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[42]  Matthieu Cord,et al.  An efficient system for combining complementary kernels in complex visual categorization tasks , 2010, 2010 IEEE International Conference on Image Processing.

[43]  J. Gallant,et al.  Spectral receptive field properties explain shape selectivity in area V4. , 2006, Journal of neurophysiology.

[44]  Matthieu Cord,et al.  Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (Cognitive Technologies) , 2008 .

[45]  Matthieu Cord,et al.  Machine Learning Techniques for Multimedia - Case Studies on Organization and Retrieval , 2008, Machine Learning Techniques for Multimedia.

[46]  Xuelong Li,et al.  Enhanced Biologically Inspired Model for Object Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[47]  E. Rolls,et al.  Scene perception: inferior temporal cortex neurons encode the positions of different objects in the scene , 2005, The European journal of neuroscience.

[48]  Matthieu Cord,et al.  Unsupervised and Supervised Visual Codes with Restricted Boltzmann Machines , 2012, ECCV.

[49]  Kunihiko Fukushima,et al.  Neocognitron for handwritten digit recognition , 2003, Neurocomputing.

[50]  Max A. Viergever,et al.  Scale and the differential structure of images , 1992, Image Vis. Comput..

[51]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[52]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[53]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  Sanja Fidler,et al.  Similarity-based cross-layered hierarchical representation for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  J. Koenderink Operational significance of receptive field assemblies , 1988, Biological Cybernetics.

[58]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[59]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).