Learning to Predict Video Saliency using Temporal Superpixels

Visual Saliency of a video sequence can be computed by combining spatial and temporal features that attract a user’s attention to a group of pixels. We present a method that computes video saliency by integrating these features: color dissimilarity, objectness measure, motion difference, and boundary score. We use temporal clusters of pixels, or temporal superpixels, to simulate attention associated with a group of moving pixels in a video sequence. The features are combined using weights learned by a linear support vector machine in an online fashion. The temporal linkage for superpixels is then used to find the saliency flow across the image frames. We experimentally demonstrate the efficacy of the proposed method and that the method has better performance when compared to state-of-the-art methods.

[1]  Xiaofeng Ren,et al.  Discriminatively Trained Sparse Code Gradients for Contour Detection , 2012, NIPS.

[2]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[3]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Nanning Zheng,et al.  Automatic salient object segmentation based on context and shape prior , 2011, BMVC.

[5]  Bodo Rosenhahn,et al.  Temporally Consistent Superpixels , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Junji Yamato,et al.  Saliency-based video segmentation with graph cuts and sequentially updated priors , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[7]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[10]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Nicolas Riche,et al.  Abnormal motion selection in crowds using bottom-up saliency , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Ronen Basri,et al.  Hierarchy and adaptivity in segmenting visual scenes , 2006, Nature.

[14]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Kimura Akisato,et al.  Saliency-based video segmentation with graph cuts and sequentially updated priors , 2009 .

[16]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[17]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[18]  Santiago Manen,et al.  Online Video SEEDS for Temporal Window Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[20]  Frédo Durand,et al.  A Topological Approach to Hierarchical Segmentation using Mean Shift , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[22]  Esa Rahtu,et al.  Segmenting Salient Objects from Images and Videos , 2010, ECCV.

[23]  Haibin Ling,et al.  Scale and Object Aware Image Thumbnailing , 2013, International Journal of Computer Vision.

[24]  Paria Mehrani,et al.  Superpixels and Supervoxels in an Energy Optimization Framework , 2010, ECCV.

[25]  Michael A. Pratt,et al.  Multiresolution superpixels for visual saliency detection , 2014, 2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP).

[26]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[28]  Chenliang Xu,et al.  Evaluation of super-voxel methods for early video processing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  O. Reiser,et al.  Principles Of Gestalt Psychology , 1936 .

[32]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.