Neural Taskonomy: Inferring the Similarity of Task-Derived Representations from Brain Activity

Convolutional neural networks (CNNs) trained for object recognition have been widely used to account for visually-driven neural responses in both the human and primate brains. However, because of the generality and complexity of the task of object classification, it is often difficult to make precise inferences about neural information processing using CNN representations from object classification despite the fact that these representations are effective for predicting brain activity. To better understand underlying the nature of the visual features encoded in different brain regions of the human brain, we predicted brain responses to images using fine-grained representations drawn from 19 specific computer vision tasks. Individual encoding models for each task were constructed and then applied to BOLD5000—a large-scale dataset comprised of fMRI scans collected while observers viewed over 5000 naturalistic scene and object images. Because different encoding models predict activity in different brain regions, we were able to associate specific vision tasks with each region. For example, within scene-selective brain regions, features from 3D tasks such as 3D keypoints and 3D edges explain greater variance as compared to 2D tasks—a pattern that replicates across the whole brain. Using results across all 19 task representations, we constructed a “task graph” based on the spatial layout of well-predicted brain areas from each task. We then compared the brain-derived task structure with the task structure derived from transfer learning accuracy in order to assess the degree of shared information between the two task spaces. These computationally-driven results—arising out of state-of-the-art computer vision methods—begin to reveal the task-specific architecture of the human visual system.

[1]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[2]  Katrina Ferrara,et al.  Neural representation of scene boundaries , 2016, Neuropsychologia.

[3]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[4]  Abhinav Gupta,et al.  BOLD5000, a public fMRI dataset while viewing 5000 visual images , 2018, Scientific Data.

[5]  Daniel D. Dilks,et al.  The occipital place area represents the local elements of scenes , 2016, NeuroImage.

[6]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[7]  Russell A. Epstein,et al.  Coding of navigational affordances in the human visual system , 2017, Proceedings of the National Academy of Sciences.

[8]  Jack L. Gallant,et al.  Fourier power, subjective distance, and object categories all provide plausible models of BOLD responses in scene-selective visual areas , 2015, Front. Comput. Neurosci..

[9]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[10]  Jitendra Malik,et al.  Pixels to Voxels: Modeling Visual Representation in the Human Brain , 2014, ArXiv.

[11]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[12]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[13]  Jack L. Gallant,et al.  Human Scene-Selective Areas Represent 3D Configurations of Surfaces , 2019, Neuron.

[14]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Dwight J. Kravitz,et al.  Deconstructing visual scenes in cortex: gradients of object and spatial layout information. , 2013, Cerebral cortex.

[16]  Jack L. Gallant,et al.  Natural Scene Statistics Account for the Representation of Scene Categories in Human Visual Cortex , 2013, Neuron.

[17]  Nancy Kanwisher,et al.  The occipital place area represents the local elements of scenes. , 2015, Journal of vision.

[18]  Jack L. Gallant,et al.  Pycortex: an interactive surface visualizer for fMRI , 2015, Front. Neuroinform..

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Xueqi Cheng,et al.  A Network for Scene Processing in the Macaque Temporal Lobe , 2013, Neuron.

[21]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Dwight J. Kravitz,et al.  Real-World Scene Representations in High-Level Visual Cortex: It's the Spaces More Than the Places , 2011, The Journal of Neuroscience.

[23]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[24]  Kshitij Dwivedi,et al.  Task-specific vision models explain task-specific areas of visual cortex , 2018 .

[25]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[26]  Aude Oliva,et al.  Parametric Coding of the Size and Clutter of Natural Scenes in the Human Brain. , 2014, Cerebral cortex.