Video saliency detection with robust temporal alignment and local-global spatial contrast

Video saliency detection, the task to detect attractive content in a video, has broad applications in multimedia understanding and retrieval. In this paper, we propose a new framework for spatiotemporal saliency detection. To better estimate the salient motion in temporal domain, we take advantage of robust alignment by sparse and low-rank decomposition to jointly estimate the salient foreground motion and the camera motion. Consecutive frames are transformed and aligned, and then decomposed to a low-rank matrix representing the background and a sparse matrix indicating the objects with salient motion. In the spatial domain, we address several problems of local center-surround contrast based model, and demonstrate how to utilize global information and prior knowledge to improve spatial saliency detection. Individual component evaluation demonstrates the effectiveness of our temporal and spatial methods. Final experimental results show that the combination of our spatial and temporal saliency maps achieve the best overall performance compared to several state-of-the-art methods.

[1]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[2]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[3]  Esa Rahtu,et al.  A Simple and efficient saliency detector for background subtraction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[4]  John Wright,et al.  RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Zheru Chi,et al.  Image pre-classification based on saliency map for image retrieval , 2009, 2009 7th International Conference on Information, Communications and Signal Processing (ICICS).

[6]  Ariel Shamir,et al.  Seam Carving for Content-Aware Image Resizing , 2007, ACM Trans. Graph..

[7]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[9]  Deepu Rajan,et al.  Sustained Observability for Salient Motion Detection , 2010, ACCV.

[10]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[11]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[12]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[13]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[14]  L. Wixson Detecting Salient Motion by Accumulating Directionally-Consistent Flow , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[16]  Qi Tian,et al.  Saliency Density Maximization for Efficient Visual Objects Discovery , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[18]  Eric Bruno,et al.  Robust motion estimation using spatial Gabor-like filters , 2002, Signal Process..

[19]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[20]  Changsheng Xu,et al.  Context saliency based image summarization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[21]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[22]  I. Haritaoglu,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002 .

[23]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Aidong Zhang,et al.  Semantics-Based Image Retrieval by Region Saliency , 2002, CIVR.

[25]  Stuart J. Russell,et al.  Image Segmentation in Video Sequences: A Probabilistic Approach , 1997, UAI.

[26]  Matti Pietikäinen,et al.  Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[28]  Marko Heikkilä,et al.  A texture-based method for modeling the background and detecting moving objects , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Liming Zhang,et al.  Spatio-temporal Saliency detection using phase spectrum of quaternion fourier transform , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[31]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[32]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  L. Itti Author address: , 1999 .

[34]  Mubarak Shah,et al.  Visual attention detection in video sequences using spatiotemporal cues , 2006, MM '06.

[35]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[36]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, SIGGRAPH 2008.