Convolutional neural networks performing a visual search task show attention-like limits on accuracy when trained to generalize across multiple search stimuli

What limits our ability to find what we are looking for in the cluttered noisy world? To investigate this, cognitive scientists have long used visual search. In spite of hundreds of studies, it remains unclear how to relate effects found using the discrete item display search task to computations in the visual system. A separate thread of research has studied the visual system of humans and other primates using convolutional neural networks (CNNs) as models. Multiple lines of evidence suggest that training CNNs to perform tasks such as image classification causes them to learn representations similar to those used by the visual system. These studies raise the question of whether CNNs that have learned such representations behave similarly to humans performing other vision-based tasks. Here we address this by measuring the behavior of CNNs trained for image classification while they perform the discrete item display search task. We first show how a fine-tuning approach often used to adapt pre-trained CNNs to new tasks can produce models that show human-like limitations on this task. However we then demonstrate that we can greatly reduce these effects by changing training,without changing the learned representations. Lastly we show that accuracy is not impaired when single networks are trained to discriminate multiple types of visual search stimuli. Based on these findings, we suggest that CNNs are not necessarily subject to the same limitations as the primate visual system.

[1]  J. Wolfe,et al.  Five factors that guide attention in visual search , 2017, Nature Human Behaviour.

[2]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[3]  W. Geisler Ideal Observer Analysis , 2002 .

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  J. Duncan,et al.  Visual search and stimulus similarity. , 1989, Psychological review.

[6]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[7]  Miguel P Eckstein,et al.  Visual search: a retrospective. , 2011, Journal of vision.

[8]  M. Eckstein The Lower Visual Search Efficiency for Conjunctions Is Due to Noise and not Serial Attentional Processing , 1998 .

[9]  Endel Poder,et al.  Capacity Limitations of Visual Search in Deep Convolutional Neural Networks , 2017, Neural Computation.

[10]  Evan M. Palmer,et al.  Reaction time distributions constrain models of visual search , 2010, Vision Research.

[11]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[12]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[13]  Preeti Verghese,et al.  The psychophysics of visual search , 2000, Vision Research.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  J. Wolfe,et al.  What Can 1 Million Trials Tell Us About Visual Search? , 1998 .

[16]  W. Geisler,et al.  Models of overt attention , 2011 .

[17]  Evan M. Palmer,et al.  Signal detection evidence for limited capacity in visual search , 2011, Attention, perception & psychophysics.

[18]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[19]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[20]  Susan L. Franzel,et al.  Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.

[21]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.