Image visual attention computation and application via the learning of object attributes

Visual attention aims at selecting a salient subset from the visual input for further processing while ignoring redundant data. The dominant view for the computation of visual attention is based on the assumption that bottom-up visual saliency such as local contrast and interest points drives the allocation of attention in scene viewing. However, we advocate in this paper that the deployment of attention is primarily and directly guided by objects and thus propose a novel framework to explore image visual attention via the learning of object attributes from eye-tracking data. We mainly aim to solve three problems: (1) the pixel-level visual attention computation (the saliency map); (2) the image-level visual attention computation; (3) the application of the computation model in image categorization. We first adopt the algorithm of object bank to acquire the responses to a number of object detectors at each location in an image and thus form a feature descriptor to indicate the occurrences of various objects at a pixel or in an image. Next, we integrate the inference of interesting objects from fixations in eye-tracking data with the competition among surrounding objects to solve the first problem. We further propose a computational model to solve the second problem and estimate the interestingness of each image via the mapping between object attributes and the inter-observer visual congruency obtained from eye-tracking data. Finally, we apply the proposed pixel-level visual attention model to the image categorization task. Comprehensive evaluations on publicly available benchmarks and comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed models.

[1]  Nanning Zheng,et al.  Automatic salient object extraction with contextual cue , 2011, 2011 International Conference on Computer Vision.

[2]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Meng Wang,et al.  Multimedia Question Answering , 2010, IEEE MultiMedia.

[4]  King Ngi Ngan,et al.  Unsupervised extraction of visual attention objects in color images , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Aline Roumy,et al.  Prediction of the inter-observer visual congruency (IOVC) and application to image ranking , 2011, ACM Multimedia.

[6]  Homer H. Chen,et al.  Learning-Based Prediction of Visual Attention for Video Signals , 2011, IEEE Transactions on Image Processing.

[7]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[8]  Hung-Khoon Tan,et al.  Beyond search: Event-driven summarization for web videos , 2011, TOMCCAP.

[9]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[10]  Jun Zhou,et al.  Object of Interest Detection by Saliency Learning , 2010, ECCV.

[11]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[12]  Peng Jiang,et al.  Keyframe-Based Video Summary Using Visual Attention Clues , 2010, IEEE Multim..

[13]  Subramanian Ramanathan,et al.  Can computers learn from humans to see better?: inferring scene semantics from viewers' eye movements , 2011, ACM Multimedia.

[14]  J. Hupé,et al.  Bistability for audiovisual stimuli: Perceptual decision is modality specific. , 2008, Journal of vision.

[15]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michael Gleicher,et al.  Video retargeting: automating pan and scan , 2006, MM '06.

[17]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[18]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[19]  Jiang Peng,et al.  Keyframe-Based Video Summary Using Visual Attention Clues , 2010 .

[20]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[21]  K. Fujii,et al.  Visualization for the analysis of fluid motion , 2005, J. Vis..

[22]  J. Henderson,et al.  Object-based attentional selection in scene viewing. , 2010, Journal of vision.

[23]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[24]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[25]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ling Shao,et al.  Specific object retrieval based on salient regions , 2006, Pattern Recognit..

[27]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[28]  Nao Ninomiya,et al.  The 10th anniversary of journal of visualization , 2007, J. Vis..

[29]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Lei Guo,et al.  An Object-Oriented Visual Saliency Detection Framework Based on Sparse Coding Representations , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Paul M. de Zeeuw,et al.  Fast saliency-aware multi-modality image fusion , 2013, Neurocomputing.

[33]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  King Ngi Ngan,et al.  Dynamic Bit Allocation for Multiple Video Object Coding , 2006, IEEE Transactions on Multimedia.

[35]  Ling Shao,et al.  Geometric and photometric invariant distinctive regions detection , 2007, Inf. Sci..

[36]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[37]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[38]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Deepu Rajan,et al.  Salient Region Detection by Modeling Distributions of Color and Orientation , 2009, IEEE Transactions on Multimedia.

[40]  Nanning Zheng,et al.  Picture Collage , 2009, IEEE Trans. Multim..

[41]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Hao Zhu,et al.  Bottom-up saliency based on weighted sparse coding residual , 2011, ACM Multimedia.

[43]  Hao Su,et al.  Objects as Attributes for Scene Classification , 2010, ECCV Workshops.

[44]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[46]  Peter Neri,et al.  Nonlinear characterization of a simple process in human vision. , 2009, Journal of vision.

[47]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[48]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[49]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[50]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[51]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[52]  Liming Zhang,et al.  Spatio-temporal Saliency detection using phase spectrum of quaternion fourier transform , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[54]  A. Mizuno,et al.  A change of the leading player in flow Visualization technique , 2006, J. Vis..