A Reverse Hierarchy Model for Predicting Eye Fixations

A number of psychological and physiological evidences suggest that early visual attention works in a coarse-to-fine way, which lays a basis for the reverse hierarchy theory (RHT). This theory states that attention propagates from the top level of the visual hierarchy that processes gist and abstract information of input, to the bottom level that processes local details. Inspired by the theory, we develop a computational model for saliency detection in images. First, the original image is downsampled to different scales to constitute a pyramid. Then, saliency on each layer is obtained by image super-resolution reconstruction from the layer above, which is defined as unpredictability from this coarse-to-fine reconstruction. Finally, saliency on each layer of the pyramid is fused into stochastic fixations through a probabilistic model, where attention initiates from the top layer and propagates downward through the pyramid. Extensive experiments on two standard eye-tracking datasets show that the proposed method can achieve competitive results with state-of-the-art models.

[1]  R. Desimone Visual attention mediated by biased competition in extrastriate visual cortex. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[3]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[4]  Gert Kootstra,et al.  Paying Attention to Symmetry , 2008, BMVC.

[5]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[6]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[7]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antón García-Díaz,et al.  Decorrelation and Distinctiveness Provide with Human-Like Saliency , 2009, ACIVS.

[10]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[11]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[12]  Nuno Vasconcelos,et al.  Discriminant Saliency, the Detection of Suspicious Coincidences, and Applications to Visual Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[14]  A. Treisman,et al.  Perceiving visually presented objets: recognition, awareness, and modularity , 1998, Current Opinion in Neurobiology.

[15]  Sung-Bae Cho,et al.  A Neural Global Workspace Model for Conscious Attention , 1997, Neural Networks.

[16]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[17]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[18]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[19]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[21]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[22]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[25]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[26]  Liming Zhang,et al.  Spatio-temporal Saliency detection using phase spectrum of quaternion fourier transform , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  A. Torralba,et al.  Fixations on low-resolution images. , 2010, Journal of vision.

[28]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[29]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[31]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[32]  Antón García-Díaz,et al.  Saliency from hierarchical adaptation through decorrelation and variance normalization , 2012, Image Vis. Comput..

[33]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  J. Wolfe Moving towards solutions to some enduring controversies in visual search , 2003, Trends in Cognitive Sciences.

[35]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[37]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[38]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[40]  S. Hochstein,et al.  The reverse hierarchy theory of visual perceptual learning , 2004, Trends in Cognitive Sciences.

[41]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Marko Tscherepanow,et al.  A saliency map based on sampling an image into random rectangular regions of interest , 2012, Pattern Recognit..

[43]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.