Predicting Eye Fixations With Higher-Level Visual Features

Saliency map and object map are the two contrasting hypotheses for the mechanisms utilized by the visual system to guide eye fixations when humans are freely viewing natural images. Most computational studies define saliency as outliers of distributions of low-level features, and propose saliency as an important factor for predicting eye fixations. Psychophysical studies, however, suggest that high-level objects predict eye fixations more accurately and the early saliency only has a minor effect. But this view has been challenged by a study which shows opposite results, suggesting that the role of object-level features needs further investigations. In addition, little is known about the role of intermediate features between the low-level and the object-level features. In this paper, we construct two models based on mid-level and object-level features, respectively, and compare their performances against those based on low-level features. Quantitative evaluation on three benchmark natural image fixation data sets demonstrates that the mid-level model outperforms the state-of-the-art low-level models by a significant margin and the object-level model is inferior to most low-level models. Quantitative evaluation on a video fixation data set demonstrates that both the mid-level and object-level models outperform the state-of-the-art low-level models, and the latter performs better under three out of four standard metrics. When combined together the two proposed models achieve even higher performance. However, incorporating the best low-level model yields negligible improvements on all of the data sets. Taken together, these results indicate that higher level features may be more effective than low-level features for predicting eye fixations on natural images in the free viewing condition.

[1]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[5]  Heinz Hügli,et al.  Assessing the contribution of color in visual attention , 2005, Comput. Vis. Image Underst..

[6]  Leslie G. Ungerleider,et al.  ‘What’ and ‘where’ in the human brain , 1994, Current Opinion in Neurobiology.

[7]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[8]  J. Gallant,et al.  Goal-Related Activity in V4 during Free Viewing Visual Search Evidence for a Ventral Stream Visual Salience Map , 2003, Neuron.

[9]  L. Zhaoping Attention capture by eye of origin singletons even without awareness--a hallmark of a bottom-up saliency map in the primary visual cortex. , 2008, Journal of vision.

[10]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[14]  R. von der Heydt,et al.  Coding of Border Ownership in Monkey Visual Cortex , 2000, The Journal of Neuroscience.

[15]  Xiaolin Hu,et al.  A Reverse Hierarchy Model for Predicting Eye Fixations , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  P. Kay Basic Color Terms: Their Universality and Evolution , 1969 .

[17]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[18]  J. Henderson,et al.  Object-based attentional selection in scene viewing. , 2010, Journal of vision.

[19]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[20]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[21]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bevil R. Conway,et al.  Specialized Color Modules in Macaque Extrastriate Cortex , 2007, Neuron.

[24]  Chengyao Shen Learning High-Level Concepts by Training A Deep Network on Eye Fixations , 2012 .

[25]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[26]  F. Qiu,et al.  Figure-ground mechanisms provide structure for selective attention , 2007, Nature Neuroscience.

[27]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[28]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[30]  Lihi Zelnik-Manor,et al.  Saliency for image manipulation , 2013, The Visual Computer.

[31]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[32]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Malcolm P. Young,et al.  Objective analysis of the topological organization of the primate cortical visual system , 1992, Nature.

[34]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[35]  F. Qiu,et al.  Figure and Ground in the Visual Cortex: V2 Combines Stereoscopic Cues with Gestalt Rules , 2005, Neuron.

[36]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[37]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[38]  Ali Borji,et al.  Objects do not predict fixations better than early saliency: a re-analysis of Einhauser et al.'s data. , 2013, Journal of vision.

[39]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[40]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[41]  Antón García-Díaz,et al.  Decorrelation and Distinctiveness Provide with Human-Like Saliency , 2009, ACIVS.

[42]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[43]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[44]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[45]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[46]  Nicolas Riche,et al.  Dynamic Saliency Models and Human Attention: A Comparative Study on Videos , 2012, ACCV.

[47]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[48]  Fahad Shahbaz Khan,et al.  Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Robert Desimone,et al.  Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 , 2005, Science.

[50]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Gert Kootstra,et al.  Predicting Eye Fixations on Complex Visual Stimuli Using Local Symmetry , 2011, Cognitive Computation.

[52]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[53]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[54]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[55]  Xiaolin Hu,et al.  Feature Selection in Supervised Saliency Prediction , 2015, IEEE Transactions on Cybernetics.

[56]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[57]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[58]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[59]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[60]  Nathalie Guyader,et al.  Relative contributions of 2D and 3D cues in a texture segmentation task, implications for the roles of striate and extrastriate cortex in attentional selection. , 2009, Journal of vision.

[61]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Zhaoping Li,et al.  Neural Activities in V1 Create a Bottom-Up Saliency Map , 2012, Neuron.