Unraveling Representations in Scene-selective Brain Regions Using Scene-Parsing Deep Neural Networks

Abstract Visual scene perception is mediated by a set of cortical regions that respond preferentially to images of scenes, including the occipital place area (OPA) and parahippocampal place area (PPA). However, the differential contribution of OPA and PPA to scene perception remains an open research question. In this study, we take a deep neural network (DNN)-based computational approach to investigate the differences in OPA and PPA function. In a first step, we search for a computational model that predicts fMRI responses to scenes in OPA and PPA well. We find that DNNs trained to predict scene components (e.g., wall, ceiling, floor) explain higher variance uniquely in OPA and PPA than a DNN trained to predict scene category (e.g., bathroom, kitchen, office). This result is robust across several DNN architectures. On this basis, we then determine whether particular scene components predicted by DNNs differentially account for unique variance in OPA and PPA. We find that variance in OPA responses uniquely explained by the navigation-related floor component is higher compared to the variance explained by the wall and ceiling components. In contrast, PPA responses are better explained by the combination of wall and floor, that is, scene components that together contain the structure and texture of the scene. This differential sensitivity to scene components suggests differential functions of OPA and PPA in scene processing. Moreover, our results further highlight the potential of the proposed computational approach as a general tool in the investigation of the neural basis of human scene perception.

[1]  M. Potter Meaning in visual search. , 1975, Science.

[2]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[3]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[4]  N. Kanwisher,et al.  Mental Imagery of Faces and Places Activates Corresponding Stimulus-Specific Brain Regions , 2000, Journal of Cognitive Neuroscience.

[5]  Rafael Malach,et al.  Large-Scale Mirror-Symmetry Organization of Human Occipito-Temporal Object Areas , 2003, Neuron.

[6]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[7]  P. Perona,et al.  What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.

[8]  Alexander Borst,et al.  How does Nature Program Neuron Types? , 2008, Front. Neurosci..

[9]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[10]  Michelle R. Greene,et al.  The Briefest of Glances: The Time Course of Natural Scene Understanding , 2009 .

[11]  J. Duncan The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour , 2010, Trends in Cognitive Sciences.

[12]  Dwight J. Kravitz,et al.  Real-World Scene Representations in High-Level Visual Cortex: It's the Spaces More Than the Places , 2011, The Journal of Neuroscience.

[13]  Soojin Park,et al.  Disentangling Scene Content from Spatial Boundary: Complementary Roles for the Parahippocampal Place Area and Lateral Occipital Complex in Representing Real-World Scenes , 2011, The Journal of Neuroscience.

[14]  J. Duncan,et al.  Adaptive Coding of Task-Relevant Information in Human Frontoparietal Cortex , 2011, The Journal of Neuroscience.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Daniel D. Dilks,et al.  The Occipital Place Area Is Causally and Selectively Involved in Scene Perception , 2013, The Journal of Neuroscience.

[17]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[18]  Russell A. Epstein,et al.  Multiple object properties drive scene-selective regions. , 2014, Cerebral cortex.

[19]  Dwight J. Kravitz,et al.  Task context impacts visual object processing differentially across the cortex , 2014, Proceedings of the National Academy of Sciences.

[20]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[21]  J. Duncan,et al.  Discrimination of Visual Categories Based on Behavioral Relevance in Widespread Regions of Frontoparietal Cortex , 2015, The Journal of Neuroscience.

[22]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[23]  Dwight J. Kravitz,et al.  A Retinotopic Basis for the Division of High-Level Scene Processing between Lateral and Ventral Human Occipitotemporal Cortex , 2015, The Journal of Neuroscience.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[26]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[27]  Jonathan S. Cant,et al.  Feature diagnosticity and task context shape activity in human scene-selective cortex , 2016, NeuroImage.

[28]  H. P. Op de Beeck,et al.  Task Context Overrules Object- and Category-Related Representational Content in the Human Parietal Cortex , 2017, Cerebral cortex.

[29]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Soojin Park,et al.  Conjoint representation of texture ensemble and location in the parahippocampal place area. , 2017, Journal of neurophysiology.

[31]  Russell A. Epstein,et al.  Coding of navigational affordances in the human visual system , 2017, Proceedings of the National Academy of Sciences.

[32]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dimitrios Pantazis,et al.  Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks , 2015, NeuroImage.

[34]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kalanit Grill-Spector,et al.  Task alters category representations in prefrontal but not high-level visual cortex , 2017, NeuroImage.

[36]  Yaoda Xu,et al.  Goal-Directed Visual Processing Differentially Impacts Human Ventral and Dorsal Visual Representations , 2017, The Journal of Neuroscience.

[37]  Li Fei-Fei,et al.  Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior , 2018, eLife.

[38]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[39]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[40]  Russell A. Epstein,et al.  Computational mechanisms underlying cortical responses to the affordance properties of visual scenes , 2017, bioRxiv.

[41]  Radoslaw Martin Cichy,et al.  The representational dynamics of task and object processing in humans , 2018, eLife.

[42]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[44]  Nikolaus Kriegeskorte,et al.  Rapid Invariant Encoding of Scene Layout in Human OPA , 2019, Neuron.

[45]  Dong Liu,et al.  High-Resolution Representations for Labeling Pixels and Regions , 2019, ArXiv.

[46]  Nikolaus Kriegeskorte,et al.  Rapid invariant encoding of scene layout in human OPA , 2019 .

[47]  Radoslaw Martin Cichy,et al.  Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.