Fixation Prediction in Videos Using Unsupervised Hierarchical Features

This paper presents a framework for saliency estimation and fixation prediction in videos. The proposed framework is based on a hierarchical feature representation obtained by stacking convolutional layers of independent subspace analysis (ISA) filters. The feature learning is thus unsupervised and independent of the task. To compute the saliency, we then employ a multiresolution saliency architecture that exploits both local and global saliency. That is, for a given image, an image pyramid is initially built. After that, for each resolution, both local and global saliency measures are computed to obtain a saliency map. The integration of saliency maps over the image pyramid provides the final video saliency. We first show that combining local and global saliency improves the results. We then compare the proposed model with several video saliency models and demonstrate that the proposed framework is capable of predicting video saliency effectively, outperforming all the other models.

[1]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[2]  Jorma Laaksonen,et al.  Bottom-Up Fixation Prediction Using Unsupervised Hierarchical Models , 2016, ACCV Workshops.

[3]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[4]  H. R. Tavakoli,et al.  Local Similarity Number and its Application to Object Tracking , 2013 .

[5]  Nicolas Riche,et al.  Dynamic Saliency Models and Human Attention: A Comparative Study on Videos , 2012, ACCV.

[6]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[7]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[8]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[9]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Esa Rahtu,et al.  Stochastic bottom-up fixation prediction and saccade generation , 2013, Image Vis. Comput..

[11]  Nicolas Riche,et al.  Abnormal motion selection in crowds using bottom-up saliency , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Ali Borji,et al.  Adaptive object tracking by learning background context , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Vladimir Zlokolica,et al.  Salient Motion Features for Video Quality Assessment , 2011, IEEE Transactions on Image Processing.

[14]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[16]  Denis Pellerin,et al.  Video summarization using a visual attention model , 2007, 2007 15th European Signal Processing Conference.

[17]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[18]  Esa Rahtu,et al.  Spherical Center-Surround for Video Saliency Detection Using Sparse Sampling , 2013, ACIVS.

[19]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[20]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[21]  Sudeep Sarkar,et al.  Saliency in images and video: a brief survey , 2012 .

[22]  H. R. Tavakoli,et al.  Visual saliency and eye movement : modeling and applications , 2014 .

[23]  Chokri Ben Amar,et al.  Transfer learning with deep networks for saliency prediction in natural video , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[24]  Ali Borji,et al.  Salient object detection: A survey , 2014, Computational Visual Media.

[25]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[26]  Esa Rahtu,et al.  Temporal Saliency for Fast Motion Detection , 2012, ACCV Workshops.

[27]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[28]  Esa Rahtu,et al.  Fast and Efficient Saliency Detection Using Sparse Sampling and Kernel Density Estimation , 2011, SCIA.

[29]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[30]  Nicolas Riche,et al.  Spatio-temporal saliency based on rare model , 2013, 2013 IEEE International Conference on Image Processing.

[31]  Laurent Itti,et al.  Saliency-based multifoveated MPEG compression , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[32]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.