Learning Video Saliency from Human Gaze Using Candidate Selection

During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel method for video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previous frame. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.

[1]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Qingshan Liu,et al.  Temporal spectral residual: fast motion saliency detection , 2009, ACM Multimedia.

[3]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[4]  Wonjun Kim,et al.  Spatiotemporal Saliency Detection and Its Applications in Static and Dynamic Scenes , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Liming Zhang,et al.  Spatio-temporal Saliency detection using phase spectrum of quaternion fourier transform , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Eli Peli,et al.  Where people look when watching movies: Do all viewers look at the same place? , 2007, Comput. Biol. Medicine.

[7]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[8]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[9]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[10]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[11]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[12]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[13]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[14]  R. C. Langford How People Look at Pictures, A Study of the Psychology of Perception in Art. , 1936 .

[15]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[16]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[17]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  J. Henderson,et al.  Edit Blindness: The relationship between attention and global change blindness in dynamic scenes. , 2008 .

[19]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[22]  Jonathan Brandt,et al.  Robust object detection via soft cascade , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Fei Yang,et al.  Temporal Spectral Residual for fast salient motion detection , 2012, Neurocomputing.

[25]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[26]  T. Smith The attentional theory of cinematic continuity , 2012 .

[27]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[28]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[30]  Rainer Stiefelhagen,et al.  Predicting human gaze using quaternion DCT image signature saliency and face detection , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).