Invariant Recognition of Objects by Vision

Invariance to various transformations is key to object recognition. Image-plane invariances – such as translation, rotation and scaling – can be computed independently of the specific object. On the other hand, both invariance to rotation in depth and invariance to changes in illumination require implicit information about the 3D structure of the object or its material properties and thus more than a single “training” image. Here, we interpret same-different perceptual tasks as classification problems. This perspective allows us to provide a formal definition of the efficiency of invariance, a bias-free summary measure of the trade-off between selectivity and invariance. We believe that this definition is the most natural and should be used in physiology, psychophysics and modeling. We characterized the efficiency of invariance in a class of feedforward architectures for visual recognition that mimic the hierarchical organization of the ventral stream. We show that this class of models achieves perfect translation and scaling invariance for novel images. In this architecture a new image is represented in terms of weights of ”templates” or “basis functions” at each level in the hierarchy. Such a representation inherits the invariance of the templates, which is built in through replications of the corresponding units across positions or scales. Simulations on real images characterize the type and number of templates needed for a representation which is sufficient to support the invariant recognition of novel objects. We conclude that the templates need not be visually similar to the test objects and that using a very small number of them is sufficient for good recognition. This surprising empirical result yields intriguing implications for the learning of invariant recognition during the development of a biological organism, such as a human baby.

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[3]  D. B. Bender,et al.  Visual Receptive Fields of Neurons in Inferotemporal Cortex of the Monkey , 1969, Science.

[4]  R. Desimone,et al.  Visual properties of neurons in area V4 of the macaque: sensitivity to stimulus form. , 1987, Journal of neurophysiology.

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  J. O'Regan,et al.  Some results on translation invariance in the human visual system. , 1990, Spatial vision.

[7]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[8]  T. Poggio A theory of how the brain might work. , 1990, Cold Spring Harbor symposia on quantitative biology.

[9]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[10]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[11]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[12]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.

[13]  M. Fahle,et al.  The role of visual field position in pattern–discrimination learning , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[15]  M. Fahle,et al.  Limited translation invariance of human visual pattern recognition , 1998, Perception & psychophysics.

[16]  H. Barrett,et al.  Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions. , 1998, Journal of the Optical Society of America. A, Optics, image science, and vision.

[17]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[18]  H. Bülthoff,et al.  Effects of temporal association on recognition memory , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  S. Edelman,et al.  Imperfect Invariance to Object Translation in the Discrimination of Complex Shapes , 2001, Perception.

[20]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[21]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[22]  J. Maunsell,et al.  Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. , 2003, Journal of neurophysiology.

[23]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[24]  Tomaso Poggio,et al.  Generalization in vision and motor control , 2004, Nature.

[25]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  J. DiCarlo,et al.  'Breaking' position-invariant object recognition , 2005, Nature Neuroscience.

[27]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[28]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[29]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[30]  Thomas Serre,et al.  Learning complex cell invariance from natural videos: A plausibility proof , 2007 .

[31]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[33]  David D. Cox,et al.  Opinion TRENDS in Cognitive Sciences Vol.11 No.8 Untangling invariant object recognition , 2022 .

[34]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.

[35]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[36]  Robbe L. T. Goris,et al.  Frontiers in Computational Neuroscience Computational Neuroscience Neural Representations That Support Invariant Object Recognition , 2022 .

[37]  J. DiCarlo,et al.  Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex , 2010, Neuron.