Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[3]  J. Movshon,et al.  The statistical reliability of signals in single neurons in cat and monkey visual cortex , 1983, Vision Research.

[4]  R. Desimone,et al.  Stimulus-selective properties of inferior temporal neurons in the macaque , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  L. Weiskrantz,et al.  Impairments of visual object transforms in monkeys. , 1984, Brain : a journal of neurology.

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[8]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[9]  G. Orban,et al.  How task-related are the responses of inferior temporal neurons? , 1995, Visual Neuroscience.

[10]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[11]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[12]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Keiji Tanaka,et al.  Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. , 1998, Journal of neurophysiology.

[15]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[16]  W. Newsome,et al.  The Variable Discharge of Cortical Neurons: Implications for Connectivity, Computation, and Information Coding , 1998, The Journal of Neuroscience.

[17]  S. Thorpe,et al.  Rapid categorization of natural images by rhesus monkeys , 1998, Neuroreport.

[18]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[19]  D. Coppola,et al.  Idiosyncratic characteristics of saccadic eye movements when viewing different visual environments , 1999, Vision Research.

[20]  Bernhard Schölkopf,et al.  Regularization Networks and Support Vector Machines , 2000 .

[21]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[22]  P. Fldik,et al.  The Speed of Sight , 2001, Journal of Cognitive Neuroscience.

[23]  N. Sigala,et al.  Visual categorization shapes feature selectivity in the primate temporal cortex , 2002, Nature.

[24]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[25]  M. Behrmann,et al.  Impact of learning on representation of parts and wholes in monkey inferotemporal cortex , 2002, Nature Neuroscience.

[26]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[27]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[28]  R. Quian Quiroga,et al.  Unsupervised Spike Detection and Sorting with Wavelets and Superparamagnetic Clustering , 2004, Neural Computation.

[29]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[30]  J. Gallant,et al.  Complete functional characterization of sensory neurons by system identification. , 2006, Annual review of neuroscience.

[31]  Keiji Tanaka,et al.  Neuronal Responses to Object Images in the Macaque Inferotemporal Cortex at Different Stimulus Discrimination Levels , 2006, The Journal of Neuroscience.

[32]  Mikio L. Braun,et al.  Accurate Error Bounds for the Eigenvalues of the Kernel Matrix , 2006, J. Mach. Learn. Res..

[33]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[34]  Peter Földiák,et al.  Bayesian binning for maximising information rate of rapid serial presentation for sensory neurons , 2007, BMC Neuroscience.

[35]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[37]  David D. Cox,et al.  Opinion TRENDS in Cognitive Sciences Vol.11 No.8 Untangling invariant object recognition , 2022 .

[38]  H. Komatsu,et al.  Effects of task demands on the responses of color-selective neurons in the inferior temporal cortex , 2007, Nature Neuroscience.

[39]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[40]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[41]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[42]  Joachim M. Buhmann,et al.  On Relevant Dimensions in Kernel Feature Spaces , 2008, J. Mach. Learn. Res..

[43]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[44]  Nikolaus Kriegeskorte,et al.  Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience , 2008, Frontiers in systems neuroscience.

[45]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[46]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[47]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[48]  Nicole C. Rust,et al.  Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT , 2010, The Journal of Neuroscience.

[49]  C. Baker,et al.  Informativeness and learning: Response to Gauthier and colleagues , 2010, Trends in Cognitive Sciences.

[50]  K. Koepsell,et al.  Oscillatory phase coupling coordinates anatomically dispersed functional cell assemblies , 2010, Proceedings of the National Academy of Sciences.

[51]  DiCarlo James,et al.  Human versus machine: comparing visual object recognition systems on a level playing field. , 2010 .

[52]  J. DiCarlo,et al.  Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex , 2010, Neuron.

[53]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[54]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[55]  Klaus-Robert Müller,et al.  Kernel Analysis of Deep Networks , 2011, J. Mach. Learn. Res..

[56]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[57]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[58]  Michèle Fabre-Thorpe,et al.  The Characteristics and Limits of Rapid Visual Categorization , 2011, Front. Psychology.

[59]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[60]  N. Kriegeskorte,et al.  Categorical, Yet Graded – Single-Image Activation Profiles of Human Category-Selective Cortical Regions , 2012, The Journal of Neuroscience.

[61]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[62]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[63]  Konrad P. Körding,et al.  Functional Connectivity and Tuning Curves in Populations of Simultaneously Recorded Neurons , 2012, PLoS Comput. Biol..

[64]  Matthew T. Kaufman,et al.  Neural population dynamics during reaching , 2012, Nature.

[65]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[66]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[67]  Ha Hong,et al.  The Neural Representation Benchmark and its Evaluation on Brain and Machine , 2013, ICLR.

[68]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[69]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[70]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[71]  Brad Wyble,et al.  Detecting meaning in RSVP at 13 ms per picture , 2013, Attention, perception & psychophysics.