Invariant Recognition Shapes Neural Representations of Visual Input.

Recognizing the people, objects, and actions in the world around us is a crucial aspect of human perception that allows us to plan and act in our environment. Remarkably, our proficiency in recognizing semantic categories from visual input is unhindered by transformations that substantially alter their appearance (e.g., changes in lighting or position). The ability to generalize across these complex transformations is a hallmark of human visual intelligence, which has been the focus of wide-ranging investigation in systems and computational neuroscience. However, while the neural machinery of human visual perception has been thoroughly described, the computational principles dictating its functioning remain unknown. Here, we review recent results in brain imaging, neurophysiology, and computational neuroscience in support of the hypothesis that the ability to support the invariant recognition of semantic entities in the visual world shapes which neural representations of sensory input are computed by human visual cortex.

[1]  Doris Y. Tsao,et al.  Patches with Links: A Unified System for Processing Faces in the Macaque Temporal Lobe , 2008, Science.

[2]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[3]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[4]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[5]  G. Kreiman,et al.  Timing, Timing, Timing: Fast Decoding of Object Information from Intracranial Field Potentials in Human Visual Cortex , 2009, Neuron.

[6]  Eero P. Simoncelli,et al.  Perceptual straightening of natural video trajectories , 2017 .

[7]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[8]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  H B Barlow,et al.  Single units and sensation: a neuron doctrine for perceptual psychology? , 1972, Perception.

[13]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[14]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[15]  Doris Y. Tsao,et al.  Mechanisms of face perception. , 2008, Annual review of neuroscience.

[16]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[17]  Eero P. Simoncelli,et al.  How MT cells analyze the motion of visual patterns , 2006, Nature Neuroscience.

[18]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[20]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[21]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[22]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[23]  Anitha Pasupathy,et al.  Transformation of shape information in the ventral pathway , 2007, Current Opinion in Neurobiology.

[24]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Joel Z. Leibo,et al.  View-Tolerant Face Recognition and Hebbian Learning Imply Mirror-Symmetric Neural Tuning to Head Orientation , 2016, Current Biology.

[26]  Joel Z. Leibo,et al.  Unsupervised learning of clutter-resistant visual representations from natural videos , 2014, ArXiv.

[27]  Lorenzo Rosasco,et al.  Word-level invariant representations from acoustic waveforms , 2014, INTERSPEECH.

[28]  Tomaso Poggio,et al.  Learning to discount transformations as the computational goal of visual cortex , 2011 .

[29]  C. Gross,et al.  Representation of visual stimuli in inferior temporal cortex. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[30]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[31]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[32]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[33]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[34]  N. Kanwisher,et al.  The Human Body , 2001 .

[35]  W. M. Keck,et al.  Highly Selective Receptive Fields in Mouse Visual Cortex , 2008, The Journal of Neuroscience.

[36]  Tomaso A. Poggio,et al.  Invariant recognition drives neural representations of action sequences , 2016, PLoS Comput. Biol..

[37]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[39]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[40]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[41]  M. A. Repucci,et al.  Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque Primary Visual Cortex , 2002 .

[42]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[43]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[44]  D. Sheinberg,et al.  Temporal Cortex Neurons Encode Articulated Actions as Slow Sequences of Integrated Poses , 2010, The Journal of Neuroscience.

[45]  Stéphane Mallat,et al.  Classification with scattering operators , 2010, CVPR 2011.

[46]  Joel Z. Leibo,et al.  Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex , 2017 .

[47]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[48]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[49]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[50]  Stefano Soatto,et al.  Visual Representations: Defining Properties and Deep Approximations , 2014, ICLR 2016.

[51]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[52]  Nicole C. Rust,et al.  Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT , 2010, The Journal of Neuroscience.

[53]  Tomaso Poggio,et al.  The invariance hypothesis implies domain-specific regions in visual cortex , 2014 .

[54]  N. Kanwisher,et al.  The fusiform face area: a cortical region specialized for the perception of faces , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[55]  T. Poggio,et al.  Memo No . 067 June 27 , 2017 Theory of Deep Learning III : Generalization Properties of SGD , 2017 .

[56]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Tomaso Poggio,et al.  A fast, invariant representation for human action in the visual system. , 2018, Journal of neurophysiology.

[58]  D. V. van Essen,et al.  Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. , 1993, Science.

[59]  G. Granlund In search of a general picture processing operator , 1978 .

[60]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[61]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[62]  S. Thorpe,et al.  Speed of processing in the human visual system , 1996, Nature.

[63]  R. Desimone,et al.  Stimulus-selective properties of inferior temporal neurons in the macaque , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[64]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[65]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[66]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[67]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..

[68]  David A. Tovar,et al.  Representational dynamics of object vision: the first 1000 ms. , 2013, Journal of vision.

[69]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[70]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.