Robust Object Recognition with Cortex-Like Mechanisms

We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  P. Schiller,et al.  Quantitative studies of single-cell properties in monkey striate cortex. III. Spatial frequency. , 1976, Journal of neurophysiology.

[3]  P. Schiller,et al.  Quantitative studies of single-cell properties in monkey striate cortex. II. Orientation specificity and ocular dominance. , 1976, Journal of neurophysiology.

[4]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[5]  T. Poggio,et al.  A computational theory of human stereo vision , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[6]  D Marr,et al.  Bandpass channels, zero-crossings, and early visual information processing. , 1979, Journal of the Optical Society of America.

[7]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[8]  R. L. Valois,et al.  The orientation and direction selectivity of cells in macaque visual cortex , 1982, Vision Research.

[9]  D. G. Albrecht,et al.  Spatial frequency selectivity of cells in macaque visual cortex , 1982, Vision Research.

[10]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[11]  Tomaso A. Poggio,et al.  On Edge Detection , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[13]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[15]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[16]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[17]  Tony Lindeberg,et al.  Scale-space theory : A framework for handling image structures at multiple scales , 1996 .

[18]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[19]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[20]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[21]  J. Wolfe,et al.  Preattentive Object Files: Shapeless Bundles of Basic Features , 1997, Vision Research.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Hartmut Neven,et al.  The Bochum/USC Face Recognition System And How it Fared in the FERET Phase III Test , 1998 .

[24]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[25]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[26]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[27]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[28]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[29]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[31]  Peter Meer,et al.  Synergism in low level vision , 2002, Object recognition supported by user interaction for service robots.

[32]  Thomas Serre,et al.  On the Role of Object-Specific Features for Real World Object Recognition in Biological Vision , 2002, Biologically Motivated Computer Vision.

[33]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[34]  Simon J. Thorpe,et al.  Ultra-Rapid Scene Categorization with a Wave of Spikes , 2002, Biologically Motivated Computer Vision.

[35]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[36]  T. Gawne,et al.  Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. , 2002, Journal of neurophysiology.

[37]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[38]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[39]  Y. Amit,et al.  An integrated network for invariant visual detection and recognition , 2003, Vision Research.

[40]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[41]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[42]  Tomaso Poggio,et al.  Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. , 2004, Journal of neurophysiology.

[43]  Thomas Serre,et al.  Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex , 2004 .

[44]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[45]  A. Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[46]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[47]  T. Sanger,et al.  Stereo disparity computation using Gabor filters , 1988, Biological Cybernetics.

[48]  Brian Leung,et al.  Component-based Car Detection in Street Scene Images , 2004 .

[49]  Tomaso Poggio,et al.  Generalization in vision and motor control , 2004, Nature.

[50]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[51]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[52]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[53]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[54]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[55]  Lior Wolf,et al.  A Unified System For Object Detection, Texture Recognition, and Context Analysis Based on the Standard Model Feature Set , 2005, BMVC.

[56]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[57]  A. Treisman,et al.  Perception of objects in natural scenes: is it really attention free? , 2005, Journal of experimental psychology. Human perception and performance.

[58]  Jean Ponce,et al.  The Local Projective Shape of Smooth Surfaces and Their Outlines , 2005, International Journal of Computer Vision.

[59]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[61]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[63]  Alex Holub,et al.  Exploiting Unlabelled Data for Hybrid Object Classification , 2005 .

[64]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[65]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Stanley M. Bileschi,et al.  Street Scenes: towards scene understanding in still images , 2006 .

[67]  Lior Wolf,et al.  Perception Strategies in Hierarchical Vision Systems , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[68]  Manuel J. Marín-Jiménez,et al.  Empirical Study of Multi-scale Filter Banks for Object Categorization , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[69]  Tomaso Poggio,et al.  Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines , 2006 .

[70]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[71]  C. Scott,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence , 2009 .

[72]  RussLL L. Ds Vnlos,et al.  SPATIAL FREQUENCY SELECTIVITY OF CELLS IN MACAQUE VISUAL CORTEX , 2022 .