On the Optimality of Spatial Attention for Object Detection

Studies on visual attention traditionally focus on its physiological and psychophysical nature [16,18,19], or its algorithmic applications [1,9,21]. We here develop a simple, formal mathematical model of the advantage of spatial attention for object detection, in which spatial attention is defined as processing a subset of the visual input, and detection is an abstraction with certain failure characteristics. We demonstrate that it is suboptimal to process the entire visual input given prior information about target locations, which in practice is almost always available in a video setting due to tracking, motion, or saliency. This argues for an attentional strategy independent of computational savings: no matter how much computational power is available, it is in principle better to dedicate it preferentially to selected portions of the scene. This suggests, anecdotally, a form of environmental pressure for the evolution of foveated photoreceptor densities in the retina. It also offers a general justification for the use of spatial attention in machine vision.

[1]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[2]  Neill W Campbell,et al.  IEEE International Conference on Computer Vision and Pattern Recognition , 2008 .

[3]  H. Pashler The Psychology of Attention , 1997 .

[4]  Laurent Itti,et al.  Neuromorphic algorithms for computer vision and attention , 2001, SPIE Optics + Photonics.

[5]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Peter Dayan,et al.  Inference, Attention, and Decision in a Bayesian Neural Architecture , 2004, NIPS.

[8]  A. Treisman How the deployment of attention determines what we see , 2006, Visual cognition.

[9]  C. Koch,et al.  Sparse Representation in the Human Medial Temporal Lobe , 2006, The Journal of Neuroscience.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[14]  C. L. M. The Psychology of Attention , 1890, Nature.

[15]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[16]  Laurent Itti,et al.  Combining attention and recognition for rapid scene analysis , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[17]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[18]  Frédéric Jurie,et al.  Learning Saliency Maps for Object Categorization , 2006 .

[19]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[20]  Yiming Ye,et al.  Where to look next in 3D object search , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[21]  Simone Frintrop,et al.  Robust Object Detection at Regions of Interest with an Application in Ball Recognition , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[22]  Joel L. Davis,et al.  Visual attention and cortical circuits , 2001 .