Modelling attention control using a convolutional neural network designed after the ventral visual pathway

ABSTRACT We recently proposed that attention control uses object-category representations consisting of category-consistent features (CCFs), those features occurring frequently and consistently across a category’s exemplars [Yu, C.-P., Maxfield, J. T., & Zelinsky, G. J. (2016). Searching for category-consistent features: A computational approach to understanding visual category representation. Psychological Science, 27(6), 870–884.] Here we extracted from a Convolutional Neural Network (CNN) designed after the primate ventral stream (VsNet) CCFs for 68 object categories spanning a three-level category hierarchy, and evaluated VsNet against the gaze behaviour of people searching for the same categorical targets. We also compared its success in predicting attention control to two other CNNs that differed in their degree and type of brain inspiration. VsNet not only replicated previous reports of stronger attention guidance to subordinate-level targets, but with its powerful CNN-CCFs it predicted attention control to individual target categories. Moreover, VsNet outperformed the other CNN models tested, despite these models having more trainable convolutional filters. We conclude that CCFs extracted from a brain-inspired CNN can predict goal-directed attention control.

[1]  Qi Zhao,et al.  Finding any Waldo with zero-shot invariant and efficient visual search , 2018, Nature Communications.

[2]  Vaidehi S. Natu,et al.  The functional neuroanatomy of face perception: from brain measurements to deep neural networks , 2018, Interface Focus.

[3]  Tim C Kietzmann,et al.  Deep Neural Networks in Computational Neuroscience , 2018, bioRxiv.

[4]  Gregory J. Zelinsky,et al.  Deep-BCN: Deep Networks Meet Biased Competition to Create a Brain-Inspired Model of Attention Control , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Dimitris Samaras,et al.  Leave-One-Out Kernel Optimization for Shadow Detection , 2018, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Meng Li,et al.  Neural Code—Neural Self-information Theory on How Cell-Assembly Code Rises from Spike Time and Neuronal Variability , 2017, Front. Cell. Neurosci..

[7]  Collin Scarince,et al.  Categorical templates are more useful when features are consistent: Evidence from eye movements during search for societally important vehicles , 2017, Attention, perception & psychophysics.

[8]  He Cui,et al.  Spike-Timing Patterns Conform to Gamma Distribution with Regional and Cell Type-Specific Characteristics , 2017, bioRxiv.

[9]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[10]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Gregory J. Zelinsky,et al.  A Model of the Superior Colliculus Predicts Fixation Locations during Scene Viewing and Visual Search , 2017, The Journal of Neuroscience.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[14]  Gregory J Zelinsky,et al.  Searching for Category-Consistent Features , 2016, Psychological science.

[15]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Gregory J. Zelinsky,et al.  Efficient Video Segmentation Using Parametric Graph Partitioning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[21]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[25]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[26]  Gregory J Zelinsky,et al.  Effects of target typicality on categorical search. , 2014, Journal of vision.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Guy A. Orban,et al.  The transition in the ventral stream from feature to real-world entity representations , 2014, Front. Psychol..

[31]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[32]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[33]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[34]  Martin Eimer,et al.  Rapid guidance of visual search by object categories. , 2014, Journal of experimental psychology. Human perception and performance.

[35]  G. Zelinsky,et al.  Eye can read your mind: decoding gaze fixations to reveal categorical search targets. , 2013, Journal of vision.

[36]  Gregory J. Zelinsky,et al.  Modeling Clutter Perception using Parametric Proto-object Partitioning , 2013, NIPS.

[37]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[38]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[39]  Yifan Peng,et al.  Modelling eye movements in a categorical search task , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[40]  Dwight J. Kravitz,et al.  The ventral visual pathway: an expanded neural framework for the processing of object quality , 2013, Trends in Cognitive Sciences.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Gregory J. Zelinsky,et al.  Searching through the hierarchy: How level of target categorization affects visual search , 2012, Visual cognition.

[43]  G. Zelinsky,et al.  Modeling guidance and recognition in categorical search: bridging human and computer object detection. , 2012, Journal of vision.

[44]  S. Dumoulin,et al.  The Relationship between Cortical Magnification Factor and Population Receptive Field Size in Human Visual Cortex: Constancies in Cortical Architecture , 2011, The Journal of Neuroscience.

[45]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[46]  S. Kastner,et al.  A neural basis for real-world visual search in human occipitotemporal cortex , 2011, Proceedings of the National Academy of Sciences.

[47]  G. Zelinsky,et al.  Short article: Search guidance is proportional to the categorical specificity of a target cue , 2009, Quarterly journal of experimental psychology.

[48]  Gregory J. Zelinsky,et al.  Visual search is guided to categorically-defined targets , 2009, Vision Research.

[49]  D. Ballard,et al.  Modelling the role of task in the control of gaze , 2009, Visual cognition.

[50]  E. Callaway,et al.  Parallel processing strategies of the primate visual system , 2009, Nature Reviews Neuroscience.

[51]  R. Tootell,et al.  An anterior temporal face patch in human cortex, predicted by macaque maps , 2009, Proceedings of the National Academy of Sciences.

[52]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[53]  G. Zelinsky A theory of eye movements during target acquisition. , 2008, Psychological review.

[54]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[55]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[56]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[57]  D. Heeger,et al.  Two Retinotopic Visual Areas in Human Lateral Occipital Cortex , 2006, The Journal of Neuroscience.

[58]  Gregory J. Zelinsky,et al.  Scene context guides eye movements during visual search , 2006, Vision Research.

[59]  Sabine Kastner,et al.  Symmetry perception in humans and macaques , 2005, Trends in Cognitive Sciences.

[60]  Alex R. Wade,et al.  Visual field maps and stimulus selectivity in human ventral occipital cortex , 2005, Nature Neuroscience.

[61]  S. Thorpe,et al.  How parallel is visual processing in the ventral pathway? , 2004, Trends in Cognitive Sciences.

[62]  Olivier P. Faugeras,et al.  The Retinotopic Organization of Primate Dorsal V4 and Surrounding Areas: A Functional Magnetic Resonance Imaging Study in Awake Monkeys , 2003, The Journal of Neuroscience.

[63]  C. Connor,et al.  Population coding of shape in area V4 , 2002, Nature Neuroscience.

[64]  Alex R. Wade,et al.  Functional measurements of human ventral occipital cortex: retinotopy and colour. , 2002, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[65]  A. T. Smith,et al.  Estimating receptive field size from fMRI data in human striate and extrastriate visual cortex. , 2001, Cerebral cortex.

[66]  H H Bülthoff,et al.  Detection of animals in natural images using far peripheral vision , 2001, The European journal of neuroscience.

[67]  Leslie G. Ungerleider,et al.  Modulation of sensory suppression: implications for receptive field sizes in the human visual cortex. , 2001, Journal of neurophysiology.

[68]  Muge M. Bakircioglu,et al.  Mapping visual cortex in monkeys and humans using surface-based atlases , 2001, Vision Research.

[69]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[70]  Robert Desimone,et al.  Cortical connections of area V4 in the macaque. , 2000, Cerebral cortex.

[71]  M. Tarr News On Views: Pandemonium Revisited , 1999, Nature Neuroscience.

[72]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[73]  Leslie G. Ungerleider,et al.  Mechanisms of directed attention in the human extrastriate cortex as revealed by functional MRI. , 1998, Science.

[74]  S. Zeki,et al.  The position and topography of the human colour centre as revealed by functional magnetic resonance imaging. , 1997, Brain : a journal of neurology.

[75]  Keiji Tanaka Mechanisms of visual object recognition: monkey and human studies , 1997, Current Opinion in Neurobiology.

[76]  G. Glover,et al.  Retinotopic organization in human visual cortex and the spatial precision of functional MRI. , 1997, Cerebral cortex.

[77]  Leslie G. Ungerleider,et al.  Cortical projections of area V2 in the macaque. , 1997, Cerebral cortex.

[78]  H. Deubel,et al.  Saccade target selection and object recognition: Evidence for a common attentional mechanism , 1996, Vision Research.

[79]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[80]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[81]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[82]  Leslie G. Ungerleider,et al.  The modular organization of projections from areas V1 and V2 to areas V4 and TEO in macaques , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[83]  Leslie G. Ungerleider,et al.  Contour, color and shape analysis beyond the striate cortex , 1985, Vision Research.

[84]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[85]  G R Loftus,et al.  The functional visual field during picture viewing. , 1980, Journal of experimental psychology. Human learning and memory.

[86]  Ken Nakayama,et al.  Visual search for object categories is predicted by the representational architecture of high-level visual cortex. , 2017, Journal of neurophysiology.

[87]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[88]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[89]  J. C. Johnston,et al.  Attention and performance. , 2001, Annual review of psychology.

[90]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.