A Time-Dependent Saliency Model Combining Center and Depth Biases for 2D and 3D Viewing Conditions

The role of the binocular disparity in the deployment of visual attention is examined in this paper. To address this point, we compared eye tracking data recorded while observers viewed natural images in 2D and 3D conditions. The influence of disparity on saliency, center and depth biases is first studied. Results show that visual exploration is affected by the introduction of the binocular disparity. In particular, participants tend to look first at closer areas in 3D condition and then direct their gaze to more widespread locations. Beside this behavioral analysis, we assess the extent to which state-of-the-art models of bottom-up visual attention predict where observers looked at in both viewing conditions. To improve their ability to predict salient regions, low-level features as well as higher-level foreground/background cues are examined. Results indicate that, consecutively to initial centering response, the foreground feature plays an active role in the early but also middle instants of attention deployments. Importantly, this influence is more pronounced in stereoscopic conditions. It supports the notion of a quasi-instantaneous bottom-up saliency modulated by higher figure/ground processing. Beyond depth information itself, the foreground cue might constitute an early process of “selection for action”. Finally, we propose a time-dependent computational model to predict saliency on still pictures. The proposed approach combines low-level visual features, center and depth biases. Its performance outperforms state-of-the-art models of bottom-up attention.

[1]  Ken Chen,et al.  Stereoscopic Visual Attention Model for 3D Video , 2010, MMM.

[2]  F. Qiu,et al.  Figure-ground mechanisms provide structure for selective attention , 2007, Nature Neuroscience.

[3]  Olivier Le Meur,et al.  Relevance of a Feed-Forward Model of Visual Attention for Goal-Oriented and Free-Viewing Tasks , 2010, IEEE Transactions on Image Processing.

[4]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[5]  R. Baddeley,et al.  Do we look at lights? Using mixture modelling to distinguish between low- and high-level factors in natural image viewing , 2009 .

[6]  Johannes M. Steger,et al.  Fusion of 3D Laser Scans and Stereo Images for Disparity Maps of Natural Scenes , 2010 .

[7]  Nathalie Guyader,et al.  Relative contributions of 2D and 3D cues in a texture segmentation task, implications for the roles of striate and extrastriate cortex in attentional selection. , 2009, Journal of vision.

[8]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[9]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[10]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[11]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[12]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Aline Roumy,et al.  Prediction of the inter-observer visual congruency (IOVC) and application to image ranking , 2011, ACM Multimedia.

[14]  Nathalie Guyader,et al.  A Functional and Statistical Bottom-Up Saliency Model to Reveal the Relative Contributions of Low-Level Visual Guiding Factors , 2010, Cognitive Computation.

[15]  Atsuto Maki,et al.  A computational model of depth-based attention , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[16]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[17]  Peter König,et al.  Influence of disparity on fixation and saccades in free viewing of natural scenes. , 2009, Journal of vision.

[18]  Heinz Hügli,et al.  Assessing the contribution of color in visual attention , 2005, Comput. Vis. Image Underst..

[19]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[20]  Vassilis Cutsuridis,et al.  A Cognitive Model of Saliency, Attention, and Picture Scanning , 2009, Cognitive Computation.

[21]  M. Bindemann Scene and screen center bias early eye movements in scene viewing , 2010, Vision Research.

[22]  Edgar Rubin Visuell wahrgenommene Figuren : Studien in psychologischer Analyse , 1921 .

[23]  Christel Chamaret,et al.  Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies , 2010, 2010 IEEE International Conference on Image Processing.

[24]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[25]  James E. Cutting,et al.  Chapter 3 – Perceiving Layout and Knowing Distances: The Integration, Relative Potency, and Contextual Use of Different Information about Depth* , 1995 .

[26]  Atsuto Maki,et al.  Attentional Scene Segmentation: Integrating Depth and Motion , 2000, Comput. Vis. Image Underst..

[27]  Heinz Hügli,et al.  Computing visual attention from scene depth , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[28]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Neil D. B. Bruce Features that draw visual attention: an information theoretic perspective , 2005, Neurocomputing.

[30]  John K. Tsotsos,et al.  An attentional framework for stereo vision , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[31]  Geoffrey M. Underwood,et al.  Cognitive Processes in Eye Guidance: Algorithms for Attention in Image Processing , 2009, Cognitive Computation.

[32]  R. VanRullen Visual saliency and spike timing in the ventral visual pathway , 2003, Journal of Physiology-Paris.

[33]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.