CNN-based Encoding and Decoding of Visual Object Recognition in Space and Time

Deep convolutional neural networks (CNNs) have been put forward as neurobiologically plausible models of the visual hierarchy. Using functional magnetic resonance imaging, CNN representations of visual stimuli have previously been shown to correspond to processing stages in the ventral and dorsal streams of the visual system. Whether this correspondence between models and brain signals also holds for activity acquired at high temporal resolution has been explored less exhaustively. Here, we addressed this question by combining CNN-based encoding models with magnetoencephalography (MEG). Human participants passively viewed 1000 images of objects while MEG signals were acquired. We modelled their high temporal resolution source-reconstructed cortical activity with CNNs, and observed a feedforward sweep across the visual hierarchy between 75-200 ms after stimulus onset. This spatiotemporal cascade was captured by the network layer representations, where the increasingly abstract stimulus representation in the hierarchical network model was reflected in different parts of the visual cortex, following the visual ventral stream. We further validated the accuracy of our encoding model by decoding stimulus identity in a left-out validation set of viewed objects, achieving state-of-the-art decoding accuracy.

[1]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[2]  G. F. Cooper,et al.  Development of the Brain depends on the Visual Environment , 1970, Nature.

[3]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[6]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[7]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[8]  Terrence J. Sejnowski,et al.  Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis , 2007, NeuroImage.

[9]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[10]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[11]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[12]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[14]  S.E. Bosch,et al.  Modeling Cognitive Processes with Neural Reinforcement Learning , 2016, bioRxiv.

[15]  Marcel van Gerven,et al.  Increasingly complex representations of natural movies across the dorsal stream are shared between subjects , 2017, NeuroImage.

[16]  Stephen M Smith,et al.  Fast robust automated brain extraction , 2002, Human brain mapping.

[17]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[18]  Katherine Guérard,et al.  Bank of Standardized Stimuli (BOSS) Phase II: 930 New Normative Photos , 2014, PloS one.

[19]  Jack L. Gallant,et al.  A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes , 2015, NeuroImage.

[20]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[21]  Essa Yacoub,et al.  The WU-Minn Human Connectome Project: An overview , 2013, NeuroImage.

[22]  W. Drongelen,et al.  Localization of brain electrical activity via linearly constrained minimum variance spatial filtering , 1997, IEEE Transactions on Biomedical Engineering.

[23]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[24]  W. Wildman,et al.  Theoretical Neuroscience , 2014 .

[25]  Marcel A. J. van Gerven,et al.  Brains on Beats , 2016, NIPS.

[26]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[27]  Robert Oostenveld,et al.  Online and offline tools for head movement compensation in MEG , 2013, NeuroImage.

[28]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[29]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[30]  Sergio Escalera,et al.  End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks , 2017, ArXiv.

[31]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[32]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[33]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[34]  Marcel van Gerven,et al.  Unsupervised Feature Learning Improves Prediction of Human Brain Activity in Response to Natural Images , 2014, PLoS Comput. Biol..

[35]  G. Nolte The magnetic lead field theorem in the quasi-static approximation and its use for magnetoencephalography forward calculation in realistic volume conductors. , 2003, Physics in medicine and biology.

[36]  Marcel A. J. van Gerven,et al.  A primer on encoding models in sensory neuroscience , 2017 .

[37]  J. Gallant,et al.  Complete functional characterization of sensory neurons by system identification. , 2006, Annual review of neuroscience.

[38]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[39]  Tomoyasu Horikawa,et al.  Generic decoding of seen and imagined objects using hierarchical visual features , 2015, Nature Communications.

[40]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[41]  Marcel van Gerven,et al.  Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.

[42]  Jesper Andersson,et al.  A multi-modal parcellation of human cerebral cortex , 2016, Nature.

[43]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[45]  Y Kamitani,et al.  Neural Decoding of Visual Imagery During Sleep , 2013, Science.

[46]  M. Brodeur,et al.  The Bank of Standardized Stimuli (BOSS), a New Set of 480 Normative Photos of Objects to Be Used as Visual Stimuli in Cognitive Research , 2010, PloS one.

[47]  Alex Clarke,et al.  Dynamic information processing states revealed through neurocognitive models of object semantics , 2014, Language, cognition and neuroscience.

[48]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[49]  Tom Heskes,et al.  Linear reconstruction of perceived images from human brain activity , 2013, NeuroImage.

[50]  Robert Oostenveld,et al.  FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data , 2010, Comput. Intell. Neurosci..

[51]  Marcel van Gerven,et al.  MEG-based decoding of the spatiotemporal dynamics of visual category perception , 2013, NeuroImage.

[52]  Radoslaw Martin Cichy,et al.  Resolving the neural dynamics of visual and auditory scene processing in the human brain: a methodological approach , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  D. Hubel,et al.  The period of susceptibility to the physiological effects of unilateral eye closure in kittens , 1970, The Journal of physiology.

[54]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[55]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[56]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.