Visual attention and target detection in cluttered natural scenes

Rather than attempting to fully interpret visual scenes in a parallel fashion, biological systems appear to employ a serial strategy by which an attentional spotlight rapidly selects circumscribed regions in the scene for further analysis. The spatiotemporal deployment of attention has been shown to be controlled by both bottom-up (image-based) and top-down (volitional) cues. We describe a detailed neuromimetic computer implementation of a bottom-up scheme for the control of visual attention, focusing on the problem of combining information across modalities (orientation, intensity, and color information) in a purely stimulusdriven manner. We have applied this model to a wide range of target detection tasks, using synthetic and natural stimuli. Performance has, however, remained difficult to objectively evaluate on natural scenes, because no objective reference was available for comparison. We present predicted search times for our model on the Search–2 database of rural scenes containing a military vehicle. Overall, we found a poor correlation between human and model search times. Further analysis, however, revealed that in 75% of the images, the model appeared to detect the target faster than humans (for comparison, we calibrated the model’s arbitrary internal time frame such that 2 to 4 image locations were visited per second). It seems that this model, which had originally been designed not to find small, hidden military vehicles, but rather to find the few most obviously conspicuous objects in an image, performed as an efficient target detector on the Search–2 dataset. Further developments of the model are finally explored, in particular through a more formal treatment of the difficult problem of extracting suitable low-level features to be fed into the saliency map.

[1]  H. Jones,et al.  Visual cortical mechanisms detecting focal orientation discontinuities , 1995, Nature.

[2]  C Koch,et al.  Revisiting spatial vision: toward a unifying model. , 2000, Journal of the Optical Society of America. A, Optics, image science, and vision.

[3]  Alexander Toet,et al.  A high-resolution image data set for testing search and detection models , 1999 .

[4]  M. Posner,et al.  Neural systems control of spatial orienting. , 1982, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[5]  Christof Koch,et al.  Comparison of feature combination strategies for saliency-based visual attention systems , 1999, Electronic Imaging.

[6]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[7]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[8]  B J Richmond,et al.  Lateral geniculate neurons in behaving primates. I. Responses to two-dimensional stimuli. , 1991, Journal of neurophysiology.

[9]  D. Spalding The Principles of Psychology , 1873, Nature.

[10]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[11]  B. Julesz,et al.  Withdrawing attention at little or no cost: Detection and discrimination tasks , 1998, Perception & psychophysics.

[12]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[13]  James R. Bergen,et al.  Parallel versus serial processing in rapid pattern discrimination , 1983, Nature.

[14]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[15]  Alexander Toet,et al.  Image dataset for testing search and detection models , 2001 .

[16]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[17]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[18]  A. Treisman Features and Objects: The Fourteenth Bartlett Memorial Lecture , 1988, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[19]  KochChristof,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 1998 .