Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance

To go beyond qualitative models of the biological substrate of object recognition, we ask: can a single ventral stream neuronal linking hypothesis quantitatively account for core object recognition performance over a broad range of tasks? We measured human performance in 64 object recognition tests using thousands of challenging images that explore shape similarity and identity preserving object variation. We then used multielectrode arrays to measure neuronal population responses to those same images in visual areas V4 and inferior temporal (IT) cortex of monkeys and simulated V1 population responses. We tested leading candidate linking hypotheses and control hypotheses, each postulating how ventral stream neuronal responses underlie object recognition behavior. Specifically, for each hypothesis, we computed the predicted performance on the 64 tests and compared it with the measured pattern of human performance. All tested hypotheses based on low- and mid-level visually evoked activity (pixels, V1, and V4) were very poor predictors of the human behavioral pattern. However, simple learned weighted sums of distributed average IT firing rates exactly predicted the behavioral pattern. More elaborate linking hypotheses relying on IT trial-by-trial correlational structure, finer IT temporal codes, or ones that strictly respect the known spatial substructures of IT (“face patches”) did not improve predictive power. Although these results do not reject those more elaborate hypotheses, they suggest a simple, sufficient quantitative model: each object recognition task is learned from the spatially distributed mean firing rates (100 ms) of ∼60,000 IT neurons and is executed as a simple weighted sum of those firing rates. SIGNIFICANCE STATEMENT We sought to go beyond qualitative models of visual object recognition and determine whether a single neuronal linking hypothesis can quantitatively account for core object recognition behavior. To achieve this, we designed a database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior.

[1]  Cordelia Schmid,et al.  Spatial pyramid matching , 2009 .

[2]  R. Romo,et al.  Neural codes for perceptual discrimination in primary somatosensory cortex , 2005, Nature Neuroscience.

[3]  W. Singer,et al.  Dynamic predictions: Oscillations and synchrony in top–down processing , 2001, Nature Reviews Neuroscience.

[4]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[5]  Ryan J. Prenger,et al.  Bayesian Reconstruction of Natural Images from Human Brain Activity , 2009, Neuron.

[6]  Jude F. Mitchell,et al.  Spatial Attention Decorrelates Intrinsic Activity Fluctuations in Macaque Area V4 , 2009, Neuron.

[7]  K. H. Britten,et al.  A relationship between behavioral choice and the visual responses of neurons in macaque MT , 1996, Visual Neuroscience.

[8]  Kenneth O. Johnson,et al.  Review: Neural Coding and the Basic Law of Psychophysics , 2002, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[9]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[10]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[11]  Charles E Connor,et al.  Underlying principles of visual shape selectivity in posterior inferotemporal cortex , 2004, Nature Neuroscience.

[12]  R. Kiani,et al.  Microstimulation of inferotemporal cortex influences face categorization , 2006, Nature.

[13]  J. Maunsell,et al.  Attention improves performance primarily by reducing interneuronal correlations , 2009, Nature Neuroscience.

[14]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[15]  Ehud Zohary,et al.  Correlated neuronal discharge rate and its implications for psychophysical performance , 1994, Nature.

[16]  Elias B. Issa,et al.  Precedence of the Eye Region in Neural Processing of Faces , 2012, The Journal of Neuroscience.

[17]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[18]  Kenji Kawano,et al.  Global and fine information coded by single neurons in the temporal visual cortex , 1999, Nature.

[19]  J. Hyvärinen,et al.  Cortical neuronal mechanisms in flutter-vibration studied in unanesthetized monkeys. Neuronal periodicity and frequency discrimination. , 1969, Journal of neurophysiology.

[20]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[21]  Eero P. Simoncelli,et al.  A functional and perceptual signature of the second visual area in primates , 2013, Nature Neuroscience.

[22]  C. Connor,et al.  Responses to contour features in macaque area V4. , 1999, Journal of neurophysiology.

[23]  Vikash Gilja,et al.  Long-term Stability of Neural Prosthetic Control Signals from Silicon Cortical Arrays in Rhesus Macaque Motor Cortex , 2010 .

[24]  Anitha Pasupathy,et al.  Transformation of shape information in the ventral pathway , 2007, Current Opinion in Neurobiology.

[25]  Gordon E. Legge,et al.  The viewpoint complexity of an object-recognition task , 1998, Vision Research.

[26]  Dora E Angelaki,et al.  Reduced choice-related activity and correlated noise accompany perceptual deficits following unilateral vestibular lesion , 2013, Proceedings of the National Academy of Sciences.

[27]  V B Mountcastle,et al.  Neuronal Coding by Cortical Cells of the Frequency of Oscillating Peripheral Stimuli , 1968, Science.

[28]  J. DiCarlo,et al.  Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination , 2015, Proceedings of the National Academy of Sciences.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[31]  A. Parker,et al.  Sense and the single neuron: probing the physiology of perception. , 1998, Annual review of neuroscience.

[32]  James J. DiCarlo,et al.  Balanced Increases in Selectivity and Tolerance Produce Constant Sparseness along the Ventral Visual Stream , 2012, The Journal of Neuroscience.

[33]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[34]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[35]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[36]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[37]  C. Connor,et al.  Tactile roughness: neural codes that account for psychophysical magnitude estimates , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[39]  James J. DiCarlo,et al.  Comparison of Object Recognition Behavior in Human and Monkey , 2014, Journal of Neuroscience.

[40]  N. Sigala,et al.  Visual Categorization and Object Representation in Monkeys and Humans , 2002, Journal of Cognitive Neuroscience.

[41]  I. Biederman,et al.  High level object recognition without an anterior inferior temporal lobe , 1997, Neuropsychologia.

[42]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[43]  R. Desimone,et al.  Stimulus-selective properties of inferior temporal neurons in the macaque , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[44]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[45]  K Tanaka,et al.  Neuronal mechanisms of object recognition. , 1993, Science.

[46]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[47]  J. Movshon,et al.  A computational analysis of the relationship between neuronal and behavioral responses to visual motion , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[48]  J. Maunsell,et al.  A Neuronal Population Measure of Attention Predicts Behavioral Performance on Individual Trials , 2010, The Journal of Neuroscience.

[49]  K. H. Britten,et al.  Neuronal correlates of a perceptual decision , 1989, Nature.

[50]  James J DiCarlo,et al.  A rodent model for the study of invariant visual object recognition , 2009, Proceedings of the National Academy of Sciences.

[51]  Kenneth O. Johnson,et al.  Neural Coding Mechanisms Underlying Perceived Roughness of Finely Textured Surfaces , 2001, The Journal of Neuroscience.

[52]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[53]  C. Gross,et al.  Effects of inferior temporal lesions on discrimination of stimuli differing in orientation , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[54]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[55]  R. Vogels,et al.  Inferotemporal neurons represent low-dimensional configurations of parameterized shapes , 2001, Nature Neuroscience.

[56]  J. Maunsell,et al.  Using Neuronal Populations to Study the Mechanisms Underlying Spatial and Feature Attention , 2011, Neuron.

[57]  A. Pouget,et al.  Neural correlations, population coding and computation , 2006, Nature Reviews Neuroscience.

[58]  I. Biederman,et al.  Recognizing depth-rotated objects: evidence and conditions for three-dimensional viewpoint invariance. , 1993, Journal of experimental psychology. Human perception and performance.

[59]  J. Hegdé,et al.  Selectivity for Complex Shapes in Primate Visual Area V2 , 2000, The Journal of Neuroscience.

[60]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[61]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[62]  Keiji Tanaka Mechanisms of visual object recognition: monkey and human studies , 1997, Current Opinion in Neurobiology.

[63]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[64]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[65]  R. Quian Quiroga,et al.  Unsupervised Spike Detection and Sorting with Wavelets and Superparamagnetic Clustering , 2004, Neural Computation.

[66]  Nicole C. Rust,et al.  Signals in inferotemporal and perirhinal cortex suggest an “untangling” of visual target information , 2013, Nature Neuroscience.

[67]  R. Vogels,et al.  Inferotemporal Cortex Subserves Three-Dimensional Structure Categorization , 2012, Neuron.

[68]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.

[69]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[70]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[71]  William H. Merigan,et al.  The contrast sensitivity of the squirrel monkey (Saimiri sciureus) , 1976, Vision Research.

[72]  David L. Sheinberg,et al.  The role of temporal cortical areas in perceptual organization. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Nicole C. Rust,et al.  Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT , 2010, The Journal of Neuroscience.

[74]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[75]  C. Connor,et al.  Population coding of shape in area V4 , 2002, Nature Neuroscience.

[76]  James J. DiCarlo,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008, Science.