Top-down control of visual attention in object detection

Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.

[1]  Thomas M. Strat,et al.  Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[3]  D. S. Wooding,et al.  Automatic control of saccadic eye movements made in visual inspection of briefly presented 2-D images. , 1995, Spatial vision.

[4]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[5]  R. Rosenholtz A simple saliency model predicts a number of motion popout phenomena , 1999, Vision Research.

[6]  J. Henderson,et al.  The effects of semantic consistency on eye movements during complex scene viewing , 1999 .

[7]  L. Itti,et al.  A neural model combining attentional orienting to object recognition: preliminary explorations on the interplay between where and what , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[9]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[10]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.