A Comparative Analysis of Visual Encoding Models Based on Classification and Segmentation Task-Driven CNNs

Nowadays, visual encoding models use convolution neural networks (CNNs) with outstanding performance in computer vision to simulate the process of human information processing. However, the prediction performances of encoding models will have differences based on different networks driven by different tasks. Here, the impact of network tasks on encoding models is studied. Using functional magnetic resonance imaging (fMRI) data, the features of natural visual stimulation are extracted using a segmentation network (FCN32s) and a classification network (VGG16) with different visual tasks but similar network structure. Then, using three sets of features, i.e., segmentation, classification, and fused features, the regularized orthogonal matching pursuit (ROMP) method is used to establish the linear mapping from features to voxel responses. The analysis results indicate that encoding models based on networks performing different tasks can effectively but differently predict stimulus-induced responses measured by fMRI. The prediction accuracy of the encoding model based on VGG is found to be significantly better than that of the model based on FCN in most voxels but similar to that of fused features. The comparative analysis demonstrates that the CNN performing the classification task is more similar to human visual processing than that performing the segmentation task.

[1]  Deanna Needell,et al.  Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit , 2007, Found. Comput. Math..

[2]  Jitendra Malik,et al.  Pixels to Voxels: Modeling Visual Representation in the Human Brain , 2014, ArXiv.

[3]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[7]  Deanna Needell,et al.  Greedy signal recovery review , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[10]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[11]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[12]  Abhinav Gupta,et al.  BOLD5000: A public fMRI dataset of 5000 images , 2018, ArXiv.

[13]  Kai Li,et al.  Computational approaches to fMRI analysis , 2017, Nature Neuroscience.

[14]  Junxing Shi,et al.  Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization , 2018, Scientific Reports.

[15]  J L Gallant,et al.  Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[16]  Marcel van Gerven,et al.  Increasingly complex representations of natural movies across the dorsal stream are shared between subjects , 2017, NeuroImage.

[17]  B. Willmore,et al.  Neural Representation of Natural Images in Visual Area V2 , 2010, The Journal of Neuroscience.

[18]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jason P. Gallivan,et al.  Neural representation of geometry and surface properties in object and scene perception , 2017, NeuroImage.

[21]  D. Donoho,et al.  Uncertainty principles and signal recovery , 1989 .

[22]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[23]  Deanna Needell,et al.  Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit , 2007, IEEE Journal of Selected Topics in Signal Processing.

[24]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[25]  E. Maguire,et al.  What does the retrosplenial cortex do? , 2009, Nature Reviews Neuroscience.

[26]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[27]  D H Hubel,et al.  Brain mechanisms of vision. , 1979, Scientific American.

[28]  Liam Paninski,et al.  Statistical models for neural encoding, decoding, and optimal stimulus design. , 2007, Progress in brain research.

[29]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[30]  D. C. Essen,et al.  Neurons in monkey visual area V2 encode combinations of orientations , 2007, Nature Neuroscience.

[31]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[32]  Russell A. Epstein,et al.  The Parahippocampal Place Area Recognition, Navigation, or Encoding? , 1999, Neuron.

[33]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[34]  David J. Field,et al.  How Close Are We to Understanding V1? , 2005, Neural Computation.

[35]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[36]  David D. Cox,et al.  Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex , 2003, NeuroImage.

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  N. Kanwisher,et al.  The lateral occipital complex and its role in object recognition , 2001, Vision Research.

[39]  Junwei Han,et al.  Survey of encoding and decoding of visual stimulus via FMRI: an image analysis perspective , 2013, Brain Imaging and Behavior.

[40]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.