Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks

Abstract Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower‐level visual analysis of single scenes at ˜100 ms, we found a marker of real‐world scene size, i.e. spatial layout processing, at ˜250 ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain.

[1]  N. Kriegeskorte,et al.  Author ' s personal copy Representational geometry : integrating cognition , computation , and the brain , 2013 .

[2]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[3]  David A. Tovar,et al.  Representational dynamics of object vision: the first 1000 ms. , 2013, Journal of vision.

[4]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[5]  Dimitrios Pantazis,et al.  Can visual information encoded in cortical columns be decoded from magnetoencephalography data in humans? , 2015, NeuroImage.

[6]  Emilio Kropff,et al.  Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.

[7]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Christian F. Doeller,et al.  Establishing the Boundaries: The Hippocampal Contribution to Imagining Scenes , 2010, The Journal of Neuroscience.

[9]  Leila Reddy,et al.  Coding of visual objects in the ventral stream , 2006, Current Opinion in Neurobiology.

[10]  Dwight J. Kravitz,et al.  Real-World Scene Representations in High-Level Visual Cortex: It's the Spaces More Than the Places , 2011, The Journal of Neuroscience.

[11]  Tomaso Poggio,et al.  Generalization in vision and motor control , 2004, Nature.

[12]  Aude Oliva,et al.  Parametric Coding of the Size and Clutter of Natural Scenes in the Human Brain. , 2014, Cerebral cortex.

[13]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[14]  Magdalena G. Wutte,et al.  Modality-Independent Coding of Spatial Layout in the Human Brain , 2011, Current Biology.

[15]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[16]  C. Connor,et al.  Neural representations for object perception: structure, category, and adaptive coding. , 2011, Annual review of neuroscience.

[17]  Nikolaus Kriegeskorte,et al.  Explaining the hierarchy of visual representational geometries by remixing of features from many computational vision models , 2014 .

[18]  T. Allison,et al.  Face recognition in human extrastriate cortex. , 1994, Journal of neurophysiology.

[19]  Paul E. Downing,et al.  An event-related potential component sensitive to images of the human body , 2006, NeuroImage.

[20]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[21]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[22]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[23]  Richard M. Leahy,et al.  A comparison of random field theory and permutation methods for the statistical analysis of MEG data , 2005, NeuroImage.

[24]  Eleanor A Maguire,et al.  A New Role for the Parahippocampal Cortex in Representing Space , 2011, The Journal of Neuroscience.

[25]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[26]  Russell A. Epstein,et al.  Constructing scenes from objects in human occipitotemporal cortex , 2011, Nature Neuroscience.

[27]  V. Lamme,et al.  The time course of natural scene perception with reduced attention. , 2016, Journal of neurophysiology.

[28]  C. Koch,et al.  Latency and Selectivity of Single Neurons Indicate Hierarchical Processing in the Human Medial Temporal Lobe , 2008, The Journal of Neuroscience.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Soojin Park,et al.  Disentangling Scene Content from Spatial Boundary: Complementary Roles for the Parahippocampal Place Area and Lateral Occipital Complex in Representing Real-World Scenes , 2011, The Journal of Neuroscience.

[31]  Riitta Hari,et al.  Human cortical representation of virtual auditory space: differences between sound azimuth and elevation , 2002, The European journal of neuroscience.

[32]  A. Leventhal,et al.  Signal timing across the macaque visual system. , 1998, Journal of neurophysiology.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Xueqi Cheng,et al.  A Network for Scene Processing in the Macaque Temporal Lobe , 2013, Neuron.

[35]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[36]  Paavo Alku,et al.  Neuromagnetic recordings reveal the temporal dynamics of auditory spatial processing in the human cortex , 2006, Neuroscience Letters.

[37]  T. Allison,et al.  Electrophysiological Studies of Face Perception in Humans , 1996, Journal of Cognitive Neuroscience.

[38]  Russell A. Epstein Cognitive Neuroscience: Scene Layout from Vision and Touch , 2011, Current Biology.

[39]  J. Stekelenburg,et al.  The neural correlates of perceiving human bodies: an ERP study on the body-inversion effect , 2004, Neuroreport.

[40]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[41]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[42]  M. D’Esposito,et al.  An Area within Human Ventral Cortex Sensitive to “Building” Stimuli Evidence and Implications , 1998, Neuron.

[43]  Nikolaus Weiskopf,et al.  Decoding Representations of Scenes in the Medial Temporal Lobes , 2011, Hippocampus.

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[46]  N. Kanwisher,et al.  Stages of processing in face perception: an MEG study , 2002, Nature Neuroscience.

[47]  N. Kanwisher,et al.  The Human Body , 2001 .

[48]  Radoslaw Martin Cichy,et al.  Resolving human object recognition in space and time , 2014, Nature Neuroscience.

[49]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[50]  Dimitrios Pantazis,et al.  Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition , 2015 .

[51]  Christian F. Doeller,et al.  Evidence for grid cells in a human memory network , 2010, Nature.

[52]  Dwight J. Kravitz,et al.  A new neural framework for visuospatial processing , 2011, Nature Reviews Neuroscience.

[53]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[54]  S. Thorpe,et al.  Speed of processing in the human visual system , 1996, Nature.

[55]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[56]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[57]  I. Fried,et al.  Direct recordings of grid-like neuronal activity in human spatial navigation , 2013, Nature Neuroscience.

[58]  L. Tyler,et al.  Object-Specific Semantic Coding in Human Perirhinal Cortex , 2014, The Journal of Neuroscience.

[59]  Ha Hong,et al.  The Neural Representation Benchmark and its Evaluation on Brain and Machine , 2013, ICLR.

[60]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[61]  D. Jeffreys Evoked Potential Studies of Face and Object Processing , 1996 .

[62]  Charles E. Connor,et al.  A Channel for 3D Environmental Shape in Anterior Inferotemporal Cortex , 2014, Neuron.

[63]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[64]  Christian F. Doeller,et al.  Parallel striatal and hippocampal systems for landmarks and boundaries in spatial memory , 2008, Proceedings of the National Academy of Sciences.

[65]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.