Dynamics of Scene Representations in the Human Brain revealed by MEG and Deep Neural Networks

Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ∼ 100ms, we found a marker of real-world scene size, i.e., spatial layout processing, at ∼ 250ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative explanation that captures the complexity of scene recognition, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and a novel quantitative model of how spatial layout representations may emerge in the human brain. The supplemental materials are available at: http://brainmodels.csail.mit.edu/scene-size

[1]  Leila Reddy,et al.  Coding of visual objects in the ventral stream , 2006, Current Opinion in Neurobiology.

[2]  Russell A. Epstein,et al.  Constructing scenes from objects in human occipitotemporal cortex , 2011, Nature Neuroscience.

[3]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[4]  M. D’Esposito,et al.  An Area within Human Ventral Cortex Sensitive to “Building” Stimuli Evidence and Implications , 1998, Neuron.

[5]  David A. Tovar,et al.  Representational dynamics of object vision: the first 1000 ms. , 2013, Journal of vision.

[6]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[7]  T. Allison,et al.  Electrophysiological Studies of Face Perception in Humans , 1996, Journal of Cognitive Neuroscience.

[8]  Tom Hartley,et al.  Low-Level Image Properties of Visual Objects Predict Patterns of Neural Response across Category-Selective Regions of the Ventral Visual Pathway , 2014, The Journal of Neuroscience.

[9]  L. Tyler,et al.  Object-Specific Semantic Coding in Human Perirhinal Cortex , 2014, The Journal of Neuroscience.

[10]  Tom Hartley,et al.  Patterns of response to visual scenes are linked to the low-level properties of the image , 2014, NeuroImage.

[11]  Nikolaus Weiskopf,et al.  Decoding Representations of Scenes in the Medial Temporal Lobes , 2011, Hippocampus.

[12]  Christian F. Doeller,et al.  Parallel striatal and hippocampal systems for landmarks and boundaries in spatial memory , 2008, Proceedings of the National Academy of Sciences.

[13]  Riitta Hari,et al.  Human cortical representation of virtual auditory space: differences between sound azimuth and elevation , 2002, The European journal of neuroscience.

[14]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[15]  Christian F. Doeller,et al.  Establishing the Boundaries: The Hippocampal Contribution to Imagining Scenes , 2010, The Journal of Neuroscience.

[16]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[17]  Paul E. Downing,et al.  An event-related potential component sensitive to images of the human body , 2006, NeuroImage.

[18]  S. Thorpe,et al.  Speed of processing in the human visual system , 1996, Nature.

[19]  Nikolaus Kriegeskorte,et al.  Explaining the hierarchy of visual representational geometries by remixing of features from many computational vision models , 2014 .

[20]  Tomaso Poggio,et al.  Generalization in vision and motor control , 2004, Nature.

[21]  Dwight J. Kravitz,et al.  A new neural framework for visuospatial processing , 2011, Nature Reviews Neuroscience.

[22]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23]  Richard M. Leahy,et al.  Brainstorm: A User-Friendly Application for MEG/EEG Analysis , 2011, Comput. Intell. Neurosci..

[24]  Magdalena G. Wutte,et al.  Modality-Independent Coding of Spatial Layout in the Human Brain , 2011, Current Biology.

[25]  Charles E. Connor,et al.  A Channel for 3D Environmental Shape in Anterior Inferotemporal Cortex , 2014, Neuron.

[26]  J. Stekelenburg,et al.  The neural correlates of perceiving human bodies: an ERP study on the body-inversion effect , 2004, Neuroreport.

[27]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[28]  Dwight J. Kravitz,et al.  Real-World Scene Representations in High-Level Visual Cortex: It's the Spaces More Than the Places , 2011, The Journal of Neuroscience.

[29]  Soojin Park,et al.  Disentangling Scene Content from Spatial Boundary: Complementary Roles for the Parahippocampal Place Area and Lateral Occipital Complex in Representing Real-World Scenes , 2011, The Journal of Neuroscience.

[30]  Richard M. Leahy,et al.  A comparison of random field theory and permutation methods for the statistical analysis of MEG data , 2005, NeuroImage.

[31]  T. Allison,et al.  Face recognition in human extrastriate cortex. , 1994, Journal of neurophysiology.

[32]  Christian F. Doeller,et al.  Evidence for grid cells in a human memory network , 2010, Nature.

[33]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[34]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[36]  Xueqi Cheng,et al.  A Network for Scene Processing in the Macaque Temporal Lobe , 2013, Neuron.

[37]  Sennay Ghebreab,et al.  From Image Statistics to Scene Gist: Evoked Neural Activity Reveals Transition from Low-Level Natural Image Structure to Scene Category , 2013, The Journal of Neuroscience.

[38]  N. Kanwisher,et al.  The Human Body , 2001 .

[39]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[40]  C. Connor,et al.  Neural representations for object perception: structure, category, and adaptive coding. , 2011, Annual review of neuroscience.

[41]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Emilio Kropff,et al.  Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.

[44]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[45]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[46]  N. Kriegeskorte,et al.  Author ' s personal copy Representational geometry : integrating cognition , computation , and the brain , 2013 .

[47]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[48]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[49]  Russell A. Epstein Cognitive Neuroscience: Scene Layout from Vision and Touch , 2011, Current Biology.

[50]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[51]  C. Koch,et al.  Latency and Selectivity of Single Neurons Indicate Hierarchical Processing in the Human Medial Temporal Lobe , 2008, The Journal of Neuroscience.

[52]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[53]  Russell A. Epstein,et al.  The Parahippocampal Place Area Recognition, Navigation, or Encoding? , 1999, Neuron.

[54]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[55]  Paavo Alku,et al.  Neuromagnetic recordings reveal the temporal dynamics of auditory spatial processing in the human cortex , 2006, Neuroscience Letters.

[56]  Aude Oliva,et al.  Parametric Coding of the Size and Clutter of Natural Scenes in the Human Brain. , 2014, Cerebral cortex.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  D. Jeffreys Evoked Potential Studies of Face and Object Processing , 1996 .

[59]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[61]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[62]  N. Kanwisher,et al.  Stages of processing in face perception: an MEG study , 2002, Nature Neuroscience.

[63]  Ha Hong,et al.  The Neural Representation Benchmark and its Evaluation on Brain and Machine , 2013, ICLR.

[64]  Radoslaw Martin Cichy,et al.  Resolving human object recognition in space and time , 2014, Nature Neuroscience.

[65]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.