Limited correspondence in visual representation between the human brain and convolutional neural networks

Convolutional neural networks (CNNs) have achieved very high object categorization performance recently. It has increasingly become a common practice in human fMRI research to regard CNNs as working model of the human visual system. Here we reevaluate this approach by comparing fMRI responses from the human brain in three experiments with those from 14 different CNNs. Our visual stimuli included original and filtered versions of real-world object images and images of artificial objects. Replicating previous findings, we found a brain-CNN correspondence in a number of CNNs with lower and higher levels of visual representations in the human brain better resembling those of lower and higher CNN layers, respectively. Moreover, the lower layers of some CNNs could fully capture the representational structure of human early visual areas for both the original and filtered real-world object images. Despite these successes, no CNN examined could fully capture the representational structure of higher human visual processing areas. They also failed to capture that of artificial object images in all levels of visual processing. The latter is particularly troublesome, as decades of vision research has demonstrated that the same algorithms used in the processing of natural images would support the processing of artificial visual stimuli in the primate brain. Similar results were obtained when a CNN was trained with stylized object images that emphasized shape representation. CNNs likely represent visual information in fundamentally different ways from the human brain. Current CNNs thus may not serve as sound working models of the human visual system. Significance Statement Recent CNNs have achieved very high object categorization performance, with some even exceeding human performance. It has become common practice in recent neuroscience research to regard CNNs as working models of the human visual system. Here we evaluate this approach by comparing fMRI responses from the human brain with those from 14 different CNNs. Despite CNNs’ ability to successfully perform visual object categorization like the human visual system, they appear to represent visual information in fundamentally different ways from the human brain. Current CNNs thus may not serve as sound working models of the human visual system. Given the current dominating trend of incorporating CNN modeling in visual neuroscience research, our results question the validity of such an approach.

[1]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[2]  Hongjing Lu,et al.  Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..

[3]  Radoslaw Martin Cichy,et al.  Object Vision in a Structured World , 2019, Trends in Cognitive Sciences.

[4]  Greg O. Horne,et al.  Controlling low-level image properties: The SHINE toolbox , 2010, Behavior research methods.

[5]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[6]  Yaoda Xu,et al.  Understanding location- and feature-based processing along the human intraparietal sulcus. , 2016, Journal of neurophysiology.

[7]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[8]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[9]  Yaoda Xu,et al.  Goal-Directed Visual Processing Differentially Impacts Human Ventral and Dorsal Visual Representations , 2017, The Journal of Neuroscience.

[10]  N. Kanwisher,et al.  Cortical Regions Involved in Perceiving Object Shape , 2000, The Journal of Neuroscience.

[11]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[12]  Thomas P O'Connell,et al.  Predicting eye movement patterns from fMRI responses to natural scenes , 2018, Nature Communications.

[13]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[14]  James J DiCarlo,et al.  Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks , 2018, The Journal of Neuroscience.

[15]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[16]  Thomas Serre,et al.  Deep Learning: The Good, the Bad, and the Ugly. , 2019, Annual review of vision science.

[17]  Aran Nayebi,et al.  Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs , 2019, NeurIPS.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Yi Chen,et al.  Encoding the identity and location of objects in human LOC , 2011, NeuroImage.

[20]  Yaoda Xu,et al.  A Tale of Two Visual Systems: Invariant and Adaptive Visual Information Representations in the Primate Brain. , 2018, Annual review of vision science.

[21]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22]  Yaoda Xu,et al.  Task modulation of the 2-pathway characterization of occipitotemporal and posterior parietal visual object representations , 2019, Neuropsychologia.

[23]  J W Belliveau,et al.  Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. , 1995, Science.

[24]  Anders M. Dale,et al.  Cortical Surface-Based Analysis I. Segmentation and Surface Reconstruction , 1999, NeuroImage.

[25]  K. Grill-Spector,et al.  The dynamics of object-selective activation correlate with recognition performance in humans , 2000, Nature Neuroscience.

[26]  Kendrick N. Kay,et al.  Principles for models of neural information processing , 2017, NeuroImage.

[27]  J Brendan Ritchie,et al.  The Ventral Visual Pathway Represents Animal Appearance over Animacy, Unlike Human Behavior and Deep Neural Networks , 2019, The Journal of Neuroscience.

[28]  N. Kanwisher,et al.  Only some spatial patterns of fMRI response are read out in task performance , 2007, Nature Neuroscience.

[29]  Nasour Bagheri,et al.  Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models , 2017, Scientific Reports.

[30]  N. Kriegeskorte,et al.  Author ' s personal copy Representational geometry : integrating cognition , computation , and the brain , 2013 .

[31]  Timothée Masquelier,et al.  Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition , 2015, Scientific Reports.

[32]  anonymous,et al.  Visual agnosia , 2012, BMJ : British Medical Journal.

[33]  Chris I. Baker,et al.  Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images , 2018, NeuroImage.

[34]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[35]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[36]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[37]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[39]  Shimon Ullman,et al.  Atoms of recognition in human and computer vision , 2016, Proceedings of the National Academy of Sciences.

[40]  Tom Hartley,et al.  Low-Level Image Properties of Visual Objects Predict Patterns of Neural Response across Category-Selective Regions of the Ventral Visual Pathway , 2014, The Journal of Neuroscience.

[41]  S. Edelman,et al.  Human Brain Mapping 6:316–328(1998) � A Sequence of Object-Processing Stages Revealed by fMRI in the Human Occipital Lobe , 2022 .

[42]  Yaoda Xu,et al.  Spatial Frequency Tolerant Visual Object Representations in the Human Ventral and Dorsal Visual Processing Pathways , 2019, Journal of Cognitive Neuroscience.

[43]  R. Malach,et al.  Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Ricardo Matsumura de Araújo,et al.  On the Performance of GoogLeNet and AlexNet Applied to Sketches , 2016, AAAI.

[45]  James J DiCarlo,et al.  Neural population control via deep image synthesis , 2018, Science.

[46]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[47]  Nikolaus Kriegeskorte,et al.  Recurrence is required to capture the representational dynamics of the human visual system , 2019, Proceedings of the National Academy of Sciences.

[48]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[49]  Nikolaus Kriegeskorte,et al.  Recurrence is required to capture the representational dynamics of the human visual system , 2019, Proceedings of the National Academy of Sciences.

[50]  N. Kanwisher,et al.  A stable topography of selectivity for unfamiliar shape classes in monkey inferior temporal cortex. , 2008, Cerebral cortex.

[51]  Leon A. Gatys,et al.  Texture and art with deep neural networks , 2017, Current Opinion in Neurobiology.

[52]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[53]  Yaoda Xu,et al.  An Information-Driven 2-Pathway Characterization of Occipitotemporal and Posterior Parietal Visual Object Representations. , 2019, Cerebral cortex.

[54]  G. Orban,et al.  Comparative mapping of higher visual areas in monkeys and humans , 2004, Trends in Cognitive Sciences.

[55]  Marcel van Gerven,et al.  Increasingly complex representations of natural movies across the dorsal stream are shared between subjects , 2017, NeuroImage.

[56]  Talia Konkle,et al.  Mid-level visual features underlie the high-level categorical organization of the ventral stream , 2018, Proceedings of the National Academy of Sciences.

[57]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[58]  Lotfi B Merabet,et al.  Visual Topography of Human Intraparietal Sulcus , 2007, The Journal of Neuroscience.

[59]  L. Jakobson,et al.  A neurological dissociation between perceiving objects and grasping them , 1991, Nature.

[60]  Radoslaw Martin Cichy,et al.  Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.

[61]  Talia Konkle,et al.  Reliability-based voxel selection , 2019, NeuroImage.

[62]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[63]  Jonas Kubilius,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2019, Nature Neuroscience.