A proto-object-based computational model for visual saliency.

State-of-the-art bottom-up saliency models often assign high saliency values at or near high-contrast edges, whereas people tend to look within the regions delineated by those edges, namely the objects. To resolve this inconsistency, in this work we estimate saliency at the level of coherent image regions. According to object-based attention theory, the human brain groups similar pixels into coherent regions, which are called proto-objects. The saliency of these proto-objects is estimated and incorporated together. As usual, attention is given to the most salient image regions. In this paper we employ state-of-the-art computer vision techniques to implement a proto-object-based model for visual attention. Particularly, a hierarchical image segmentation algorithm is used to extract proto-objects. The two most powerful ways to estimate saliency, rarity-based and contrast-based saliency, are generalized to assess the saliency at the proto-object level. The rarity-based saliency assesses if the proto-object contains rare or outstanding details. The contrast-based saliency estimates how much the proto-object differs from the surroundings. However, not all image regions with high contrast to the surroundings attract human attention. We take this into account by distinguishing between external and internal contrast-based saliency. Where the external contrast-based saliency estimates the difference between the proto-object and the rest of the image, the internal contrast-based saliency estimates the complexity of the proto-object itself. We evaluate the performance of the proposed method and its components on two challenging eye-fixation datasets (Judd, Ehinger, Durand, & Torralba, 2009; Subramanian, Katti, Sebe, Kankanhalli, & Chua, 2010). The results show the importance of rarity-based and both external and internal contrast-based saliency in fixation prediction. Moreover, the comparison with state-of-the-art computational models for visual saliency demonstrates the advantage of proto-objects as units of analysis.

[1]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[2]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[3]  A. Friedman Framing pictures: the role of knowledge in automatized encoding and memory for gist. , 1979, Journal of experimental psychology. General.

[4]  R. C. Langford How People Look at Pictures, A Study of the Psychology of Perception in Art. , 1936 .

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  B. Julesz,et al.  Human factors and behavioral science: Textons, the fundamental elements in preattentive vision and perception of textures , 1983, The Bell System Technical Journal.

[7]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[9]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[10]  T. Foulsham,et al.  Eye movements during scene inspection: A test of the saliency map hypothesis , 2006 .

[11]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[12]  R. Rafal,et al.  Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. , 1994, Journal of experimental psychology. General.

[13]  B. Scholl Objects and attention: the state of the art , 2001, Cognition.

[14]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[16]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[17]  Jan-Mark Geusebroek,et al.  Salient object detection: From pixels to segments , 2013, Image Vis. Comput..

[18]  R. Rosenholtz A simple saliency model predicts a number of motion popout phenomena , 1999, Vision Research.

[19]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[20]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[21]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[22]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[23]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[24]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[25]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[26]  R. Baddeley,et al.  Do we look at lights? Using mixture modelling to distinguish between low- and high-level factors in natural image viewing , 2009 .

[27]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[28]  Arnold W. M. Smeulders,et al.  Real-time bag of words, approximately , 2009, CIVR '09.

[29]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[30]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[31]  J. Henderson,et al.  Object-based attentional selection in scene viewing. , 2010, Journal of vision.

[32]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[33]  P. Lang International Affective Picture System (IAPS) : Technical Manual and Affective Ratings , 1995 .

[34]  C. Koch,et al.  Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. , 2008, Journal of vision.

[35]  Roland J. Baddeley,et al.  High frequency edges (but not contrast) predict where we fixate: A Bayesian system identification analysis , 2006, Vision Research.

[36]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[37]  Michael L. Mack,et al.  VISUAL SALIENCY DOES NOT ACCOUNT FOR EYE MOVEMENTS DURING VISUAL SEARCH IN REAL-WORLD SCENES , 2007 .

[38]  J. Wolfe,et al.  Fixational Eye Movements Are Not an Index of Covert Attention , 2007, Psychological science.

[39]  T. Foulsham,et al.  What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. , 2008, Journal of vision.

[40]  S. Yantis,et al.  Cortical mechanisms of space-based and object-based attentional control , 2003, Current Opinion in Neurobiology.

[41]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[42]  Giulio Sandini,et al.  A Proto-object Based Visual Attention Model , 2008, WAPCV.

[43]  Preeti Verghese,et al.  Where to look next? Eye movements reduce local uncertainty. , 2007, Journal of vision.

[44]  G. Hauske,et al.  Object and scene analysis by saccadic eye-movements: an investigation with higher-order statistics. , 2000, Spatial vision.

[45]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[46]  T. A. Kelley,et al.  Cortical mechanisms for shifting and holding visuospatial attention. , 2008, Cerebral cortex.

[47]  J. Duncan Selective attention and the organization of visual information. , 1984, Journal of experimental psychology. General.

[48]  Bernhard Schölkopf,et al.  Center-surround patterns emerge as optimal predictors for human saccade targets. , 2009, Journal of vision.

[49]  Jochen J. Steil,et al.  Where to Look Next? Combining Static and Dynamic Proto-objects in a TVA-based Model of Visual Attention , 2010, Cognitive Computation.

[50]  J. Beck Effect of orientation and of shape similarity on perceptual grouping , 1966 .

[51]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  J. Henderson Eye movement control during visual object processing: effects of initial fixation position and semantic constraint. , 1993, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[53]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[54]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[55]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[56]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[57]  M. Farah,et al.  Does visual attention select objects or locations? , 1994, Journal of experimental psychology. General.

[58]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[59]  Shinsuke Shimojo,et al.  Visual surface representation: a critical link between lower-level and higher level vision , 1995 .

[60]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[61]  K. Rayner Eye Guidance in Reading: Fixation Locations within Words , 1979, Perception.

[62]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[63]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[64]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[65]  Ronald A. Rensink Seeing, sensing, and scrutinizing , 2000, Vision Research.

[66]  D. S. Wooding,et al.  The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. , 1996, Spatial vision.

[67]  J. Duncan,et al.  Visual search and stimulus similarity. , 1989, Psychological review.

[68]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.