Neural network models of object recognition can also account for visual search behavior

What limits our ability to find an object we are looking for? There are two competing models: one explains attentional limitations during visual search in terms of a serial processing computation, the other attributes limitations to noisy parallel processing. Both models predict human visual search behavior when applied to the simplified stimuli often used in experiments, but it remains unclear how to extend them to account for search of complex natural scenes. Models exist of natural scene search, but they do not predict whether a given scene will limit search accuracy. Here we propose an alternate mechanism to explain limitations across stimuli types: visual search is limited by an "untangling" computation, proposed to underlie object recognition. To test this idea, we ask whether models of object recognition account for visual search behavior. The current best-in-class models are artificial neural networks (ANNs) that accurately predict both behavior and neural activity in the primate visual system during object recognition tasks. Unlike dominant visual search models, ANNs can provide predictions for any image. First we test ANN-based object recognition models with simplified stimuli typically used in studies of visual search. We find these models exhibit a hallmark effect of such studies: a drop in target detection accuracy as the number of distractors increases. Further experiments show this effect results from learned representations: networks that are not pre-trained for object recognition can achieve near perfect accuracy. Next we test these models with complex natural images, using a version of the Pascal VOC dataset where each image has a visual search difficulty score, derived from human reaction times. We find models exhibit a drop in accuracy as search difficulty score increases. We conclude that ANN-based object recognition models account for aspects of visual search behavior across stimuli types, and discuss how to extend these results.

[1]  David D. Cox,et al.  Opinion TRENDS in Cognitive Sciences Vol.11 No.8 Untangling invariant object recognition , 2022 .

[2]  Chris Eliasmith,et al.  Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems , 2004, IEEE Transactions on Neural Networks.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[5]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[6]  Terrence C. Stewart,et al.  Nengo and the Neural Engineering Framework: Connecting Cognitive Theory to Neuroscience , 2011, CogSci.

[7]  Johan Hulleman,et al.  The impending demise of the item in visual search , 2015, Behavioral and Brain Sciences.

[8]  Grace W. Lindsay Attention in Psychology, Neuroscience, and Machine Learning , 2020, Frontiers in Computational Neuroscience.

[9]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[10]  W. Cowan,et al.  Visual search for colour targets that are or are not linearly separable from distractors , 1996, Vision Research.

[11]  Eric Shea-Brown,et al.  A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex , 2019, Nature Neuroscience.

[12]  Andrew M. Saxe,et al.  If deep learning is the answer, then what is the question? , 2020, 2004.07580.

[13]  Katherine R. Storrs,et al.  Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments , 2017, Front. Psychol..

[14]  Daniel Rasmussen,et al.  NengoDL: Combining Deep Learning and Neuromorphic Modelling Methods , 2018, Neuroinformatics.

[15]  Miguel P Eckstein,et al.  Visual search: a retrospective. , 2011, Journal of vision.

[16]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[17]  J. P. Thomas,et al.  A signal detection model predicts the effects of set size on visual search accuracy for feature, conjunction, triple conjunction, and disjunction displays , 2000, Perception & psychophysics.

[18]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zijiang J. He,et al.  Seeing grating-textured surface begins at the border. , 2011, Journal of vision.

[20]  Evan M. Palmer,et al.  Reaction time distributions constrain models of visual search , 2010, Vision Research.

[21]  Dietmar Heinke,et al.  Serial versus parallel search: A model comparison approach based on reaction time distributions , 2017 .

[22]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[23]  W. Geisler Ideal Observer Analysis , 2002 .

[24]  Harish Katti,et al.  How do targets, nontargets, and scene context influence real-world object detection? , 2017, Attention, Perception, & Psychophysics.

[25]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[26]  Susan L. Franzel,et al.  Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.

[27]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[28]  M. Eckstein The Lower Visual Search Efficiency for Conjunctions Is Due to Noise and not Serial Attentional Processing , 1998 .

[29]  Josh H McDermott,et al.  Deep neural network models of sensory systems: windows onto the role of task constraints , 2019, Current Opinion in Neurobiology.

[30]  Ashley M. Sherman,et al.  Visual search for arbitrary objects in real scenes , 2011, Attention, perception & psychophysics.

[31]  Preeti Verghese,et al.  The psychophysics of visual search , 2000, Vision Research.

[32]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[33]  J. Wolfe,et al.  Five factors that guide attention in visual search , 2017, Nature Human Behaviour.

[34]  Christof Koch,et al.  A large-scale, standardized physiological survey reveals higher order coding throughout the mouse visual cortex , 2018, bioRxiv.

[35]  A. Pouget,et al.  Behavior and neural basis of near-optimal visual search , 2011, Nature Neuroscience.

[36]  Nikolaus Kriegeskorte,et al.  Recurrence is required to capture the representational dynamics of the human visual system , 2019, Proceedings of the National Academy of Sciences.

[37]  Bernhard Hommel,et al.  No one knows what attention is , 2019, Attention, Perception, & Psychophysics.

[38]  Michael D'Zmura,et al.  Color in visual search , 1991, Vision Research.

[39]  Endel Poder,et al.  Capacity Limitations of Visual Search in Deep Convolutional Neural Networks , 2017, Neural Computation.

[40]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[41]  Chris Eliasmith,et al.  A Spiking Independent Accumulator Model for Winner-Take-All Computation , 2017, CogSci.

[42]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[43]  Astrid A. Prinz,et al.  Convolutional neural networks performing a visual search task show attention-like limits on accuracy when trained to generalize across multiple search stimuli , 2019, 2019 Conference on Cognitive Computational Neuroscience.

[44]  Lauren E. Welbourne,et al.  Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes , 2017, Current Biology.

[45]  S. Kastner,et al.  Attention in the real world: toward understanding its neural basis , 2014, Trends in Cognitive Sciences.

[46]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[47]  K. Nakayama,et al.  Situating visual search , 2011, Vision Research.

[48]  Dim P. Papadopoulos,et al.  How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Trevor Bekolay,et al.  Nengo: a Python tool for building large-scale functional brain models , 2014, Front. Neuroinform..

[52]  W. Geisler,et al.  Models of overt attention , 2011 .

[53]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[54]  Marius Usher,et al.  Competitive guided search: meeting the challenge of benchmark RT distributions. , 2013, Journal of vision.