Learning Optimized Features for Hierarchical Models of Invariant Object Recognition

There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.

[1]  H B Barlow,et al.  Single units and sensation: a neuron doctrine for perceptual psychology? , 1972, Perception.

[2]  H. Barlow The Twelfth Bartlett Memorial Lecture: The Role of Single Neurons in the Psychology of Perception , 1985, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[3]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[4]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[5]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[6]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[8]  K Tanaka,et al.  Neuronal mechanisms of object recognition. , 1993, Science.

[9]  D. V. van Essen,et al.  Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. , 1993, Science.

[10]  N. Logothetis,et al.  View-dependent object recognition by monkeys , 1994, Current Biology.

[11]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[12]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[13]  N. Logothetis,et al.  Psychophysical and physiological evidence for viewer-centered object representations in the primate. , 1995, Cerebral cortex.

[14]  Jacques Gautrais,et al.  Rapid Visual Processing using Spike Asynchrony , 1996, NIPS.

[15]  Keiji Tanaka,et al.  Inferotemporal cortex and object vision. , 1996, Annual review of neuroscience.

[16]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[17]  Marian Stewart Bartlett,et al.  Viewpoint Invariant Face Recognition using Independent Component Analysis and Attractor Networks , 1996, NIPS.

[18]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[19]  Jianfeng Feng,et al.  Lyapunov Functions for Neural Nets with Nondifferentiable Input-Output Characteristics , 1997, Neural Computation.

[20]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[21]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[22]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[23]  Ah Chung Tsoi,et al.  An evaluation of the neocognitron , 1997, IEEE Trans. Neural Networks.

[24]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[25]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  T. Poggio,et al.  Are Cortical Models Really Bound by the “Binding Problem”? , 1999, Neuron.

[29]  M.M. Van Hulle,et al.  View-based 3D object recognition with support vector machines , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[30]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[31]  Christoph von der Malsburg,et al.  The What and Why of Binding The Modeler’s Perspective , 1999, Neuron.

[32]  Shimon Ullman,et al.  Computation of pattern invariance in brain-like structures , 1999, Neural Networks.

[33]  Sven Behnke Hebbian learning and competition in the neural abstraction pyramid , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[34]  M. Hulle,et al.  VIEW-BASED 3 D OBJECT RECOGNITION WITH SUPPORT VECTOR MACHINES , 1999 .

[35]  Marc-Oliver Gewaltig,et al.  A model of computation in neocortical architecture , 1999, Neural Networks.

[36]  C. Gray The Temporal Correlation Hypothesis of Visual Feature Integration Still Alive and Well , 1999, Neuron.

[37]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[38]  Tomaso Poggio,et al.  A Note on Object Class Representation and Categorical Perception , 1999 .

[39]  P O Hoyer,et al.  Independent component analysis applied to feature extraction from colour and stereo images , 2000, Network.

[40]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[41]  J. Hegdé,et al.  Selectivity for Complex Shapes in Primate Visual Area V2 , 2000, The Journal of Neuroscience.

[42]  Yali Amit,et al.  A Neural Network Architecture for Visual Selection , 2000, Neural Computation.

[43]  Massimiliano Pontil,et al.  Face Detection in Still Gray Images , 2000 .

[44]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[45]  Ming-Hsuan Yang,et al.  Learning to Recognize 3D Objects , 2000 .

[46]  Bartlett W. Mel,et al.  Minimizing Binding Errors Using Learned Conjunctive Features , 2000, Neural Computation.

[47]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[48]  B. Heisele Face Detection , 2001 .

[49]  Heiko Wersing,et al.  A Competitive-Layer Model for Feature Binding and Sensory Segmentation , 2001, Neural Computation.

[50]  Tobias Rodemann,et al.  Two separate processing streams in a cortical-type architecture , 2001, Neurocomputing.

[51]  Christoph Kayser,et al.  Learning the invariance properties of complex cells from their responses to natural stimuli , 2002, The European journal of neuroscience.

[52]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[53]  Narendra Ahuja,et al.  Learning to Recognize Three-Dimensional Objects , 2002, Neural Computation.

[54]  A. Hyvärinen,et al.  A multi-layer sparse coding network learns contour coding from natural images , 2002, Vision Research.

[55]  Helge J. Ritter,et al.  Situated robot learning for multi-modal instruction and imitation of grasping , 2004, Robotics Auton. Syst..

[56]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[57]  Christoph von der Malsburg,et al.  Maplets for correspondence-based object recognition , 2004, Neural Networks.

[58]  S. Thorpe,et al.  How parallel is visual processing in the ventral pathway? , 2004, Trends in Cognitive Sciences.