Shared Spatiotemporal Category Representations in Biological and Artificial Deep Neural Networks

Understanding the computational transformations that enable invariant visual categorization is a fundamental challenge in both systems and cognitive neuroscience. Recently developed deep convolutional neural networks (CNNs) perform visual categorization at accuracies that rival humans, providing neuroscientists with the opportunity to interrogate the series of representational transformations that enable categorization in silico. The goal of the current study is to assess the extent to which sequential visual representations built by a CNN map onto those built in the human brain as assessed by high-density, time-resolved event-related potentials (ERPs). We found correspondence both over time and across the scalp: earlier ERP activity was best explained by early CNN layers at all electrodes. Later neural activity was best explained by the later, conceptual layers of the CNN. This effect was especially true both in frontal and right occipital sites. Together, we conclude that deep artificial neural networks trained to perform scene categorization traverse similar representational stages as the human brain. Thus, examining these networks will allow neuroscientists to better understand the transformations that enable invariant visual categorization.

[1]  Michael Guerzhoy,et al.  Deep Neural Networks , 2013 .

[2]  Krista A. Ehinger,et al.  SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[3]  Brad Wyble,et al.  Detecting meaning in RSVP at 13 ms per picture , 2013, Attention, perception & psychophysics.

[4]  Dimitrios Pantazis,et al.  Multivariate pattern analysis of MEG and EEG: A comparison of representational structure in time and space , 2016, NeuroImage.

[5]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[6]  S Edelman,et al.  Representation is representation of similarities , 1996, Behavioral and Brain Sciences.

[7]  Marcel van Gerven,et al.  Convolutional neural network-based encoding and decoding of visual object recognition in space and time , 2017, NeuroImage.

[8]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[11]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[12]  H. Neumann,et al.  Attention and figure-ground segregation in a model of motion perception , 2010 .

[13]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  E. Halgren,et al.  Top-down facilitation of visual recognition. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[19]  Edward A Essock,et al.  A horizontal bias in human visual processing of orientation and its correspondence to the structural components of natural scenes. , 2004, Journal of vision.

[20]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[21]  Philippe G Schyns,et al.  Accurate statistical tests for smooth classification images. , 2005, Journal of vision.

[22]  J. Bruner On perceptual readiness. , 1957, Psychological review.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[25]  Qasim Zaidi,et al.  Lightness identification of patterned three-dimensional, real objects. , 2006, Journal of vision.

[26]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[27]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[28]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Li Fei-Fei,et al.  Visual categorization is automatic and obligatory: evidence from Stroop-like paradigm. , 2014, Journal of vision.

[30]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[31]  Graham Strong,et al.  New video-based assistive technologies for low vision , 2004 .

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Michelle R. Greene,et al.  PSYCHOLOGICAL SCIENCE Research Article The Briefest of Glances The Time Course of Natural Scene Understanding , 2022 .

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Dimitrios Pantazis,et al.  Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks , 2015, NeuroImage.

[36]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[37]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[38]  L. Tyler,et al.  Predicting the Time Course of Individual Objects with MEG , 2014, Cerebral cortex.

[39]  Jitendra Malik,et al.  Pixels to Voxels: Modeling Visual Representation in the Human Brain , 2014, ArXiv.

[40]  Thomas A. Carlson,et al.  Representational dynamics of object recognition: Feedforward and feedback information flows , 2016, NeuroImage.

[41]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[42]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[43]  Bruce C Hansen,et al.  Discrimination of amplitude spectrum slope in the fovea and parafovea and the local amplitude distributions of natural scene imagery. , 2006, Journal of vision.