Different Goal-driven CNNs Affect Performance of Visual Encoding Models based on Deep Learning

A convolutional neural network with outstanding performance in computer vision can be used to construct an encoding model that simulates the process of human visual information processing. However, training goal of the network may have impacted the performance of encoding model. Most neural networks used to establish encoding models in the past were performed image classification task, the task of which is single. While in the process of human's visual perception, multiple tasks are performed simultaneously. Thus, the existing encoding model does not well satisfy the diversity and complexity of the human visual mechanism. In this paper, we first established a feature extraction model based on Fully Convolutional Network (FCN) and Visual Geometry Group (VGG) with similar network structure but different training goal, and employed Regularize Orthogonal Matching Pursuit (ROMP) to establish the response model, which can predict the stimuli-evoked responses measured by functional magnetic resonance imaging (fMRI). The results revealed that the convolutional neural networks trained by different visual tasks had significant difference in the performance of visual encoding with almost the same network structure. The VGG-based encoding model can achieve a higher performance in most voxels of ROIs. We concluded that classification task in computer vision can better fit the visual mechanism of human compared to visual segmentation task.

[1]  David M. Groppe,et al.  Seeing Scenes: Topographic Visual Hallucinations Evoked by Direct Electrical Stimulation of the Parahippocampal Place Area , 2014, The Journal of Neuroscience.

[2]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[3]  Deanna Needell,et al.  Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit , 2007, Found. Comput. Math..

[4]  Jitendra Malik,et al.  Pixels to Voxels: Modeling Visual Representation in the Human Brain , 2014, ArXiv.

[5]  R. Nathan Spreng,et al.  The Common Neural Basis of Autobiographical Memory, Prospection, Navigation, Theory of Mind, and the Default Mode: A Quantitative Meta-analysis , 2009, Journal of Cognitive Neuroscience.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  E. Maguire,et al.  What does the retrosplenial cortex do? , 2009, Nature Reviews Neuroscience.

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[10]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[11]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Kendrick Norris Kay,et al.  Principles for models of neural information processing , 2017, bioRxiv.

[13]  Deanna Needell,et al.  Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit , 2007, IEEE Journal of Selected Topics in Signal Processing.

[14]  D. Samuel Schwarzkopf,et al.  The surface area of human V1 predicts the subjective experience of object size , 2010, Nature Neuroscience.

[15]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[16]  Abhinav Gupta,et al.  BOLD5000: A public fMRI dataset of 5000 images , 2018, ArXiv.

[17]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Abhinav Gupta,et al.  BOLD5000, a public fMRI dataset while viewing 5000 visual images , 2018, Scientific Data.

[19]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[20]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[21]  Marcel van Gerven,et al.  Increasingly complex representations of natural movies across the dorsal stream are shared between subjects , 2017, NeuroImage.

[22]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[23]  Junxing Shi,et al.  Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization , 2018, Scientific Reports.