Learning to Detect Video Saliency With HEVC Features

Saliency detection has been widely studied to predict human fixations, with various applications in computer vision and image processing. For saliency detection, we argue in this paper that the state-of-the-art High Efficiency Video Coding (HEVC) standard can be used to generate the useful features in compressed domain. Therefore, this paper proposes to learn the video saliency model, with regard to HEVC features. First, we establish an eye tracking database for video saliency detection, which can be downloaded from https://github.com/remega/video_database. Through the statistical analysis on our eye tracking database, we find out that human fixations tend to fall into the regions with large-valued HEVC features on splitting depth, bit allocation, and motion vector (MV). In addition, three observations are obtained with the further analysis on our eye tracking database. Accordingly, several features in HEVC domain are proposed on the basis of splitting depth, bit allocation, and MV. Next, a kind of support vector machine is learned to integrate those HEVC features together, for video saliency detection. Since almost all video data are stored in the compressed form, our method is able to avoid both the computational cost on decoding and the storage cost on raw data. More importantly, experimental results show that the proposed method is superior to other state-of-the-art saliency detection methods, either in compressed or uncompressed domain.

[1]  E. Matin Saccadic suppression: a review and an analysis. , 1974, Psychological bulletin.

[2]  Gary J. Sullivan,et al.  Efficient quadtree coding of images and video , 1994, IEEE Trans. Image Process..

[3]  Laurent Itti,et al.  Realistic avatar eye and head animation using a neurobiological model of visual attention , 2004, SPIE Optics + Photonics.

[4]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[5]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[6]  Bernhard Schölkopf,et al.  How to Find Interesting Locations in Video: A Spatiotemporal Interest Point Detector Learned from Human Eye Movements , 2007, DAGM-Symposium.

[7]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Kunio Kashino,et al.  A stochastic model of selective visual attention with a dynamic Bayesian network , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[10]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Majid Nili Ahmadabadi,et al.  Cost-sensitive learning of top-down modulation for attentional control , 2009, Machine Vision and Applications.

[12]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[13]  Matthew H Tong,et al.  of the Annual Meeting of the Cognitive Science Society Title SUNDAy : Saliency Using Natural Statistics for Dynamic Analysis of Scenes Permalink , 2009 .

[14]  Nicholas J. Butko,et al.  Optimal scanning for faster object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[16]  Nuno Vasconcelos,et al.  Discriminant Saliency, the Detection of Suspicious Coincidences, and Applications to Visual Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[19]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[20]  Olga Sorkine-Hornung,et al.  A comparative study of image retargeting , 2010, ACM Trans. Graph..

[21]  Patrick Le Callet,et al.  Do video coding impairments disturb the visual attention deployment? , 2010, Signal Process. Image Commun..

[22]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Yu Fu,et al.  Visual saliency detection by spatially weighted dissimilarity , 2011, CVPR 2011.

[25]  Ulrich Engelke,et al.  Visual Attention in Quality Assessment , 2011, IEEE Signal Processing Magazine.

[26]  Ivan V. Bajic,et al.  Eye-Tracking Database for a Set of Standard Video Sequences , 2012, IEEE Transactions on Image Processing.

[27]  Cristian Sminchisescu,et al.  Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition , 2012, ECCV.

[28]  Gary J. Sullivan,et al.  Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Bo Wu,et al.  Integrating bottom-up and top-down visual stimulus for saliency detection in news video , 2013, Multimedia Tools and Applications.

[31]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yuan Yan Tang,et al.  A Visual-Attention Model Using Earth Mover's Distance-Based Saliency Measurement and Nonlinear Feature Combination , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Liang-Tien Chia,et al.  Regularized Feature Reconstruction for Spatio-Temporal Saliency Detection , 2013, IEEE Transactions on Image Processing.

[35]  Hu Tian,et al.  A probabilistic saliency model with memory-guided top-down cues for free-viewing , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[36]  Deepu Rajan,et al.  Salient Motion Detection in Compressed Domain , 2013, IEEE Signal Processing Letters.

[37]  Chang-Su Kim,et al.  Video saliency detection based on spatiotemporal feature learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[38]  Shengxi Li,et al.  Region-of-Interest Based Conversational HEVC Coding with Hierarchical Perception Model of Face , 2014, IEEE Journal of Selected Topics in Signal Processing.

[39]  Ali Borji,et al.  What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Houqiang Li,et al.  $\lambda $ Domain Rate Control Algorithm for High Efficiency Video Coding , 2014, IEEE Transactions on Image Processing.

[42]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[43]  Nuno Vasconcelos,et al.  How many bits does it take for a stimulus to be salient? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Tamer Shanableh,et al.  Saliency detection in MPEG and HEVC video using intra-frame and inter-frame distances , 2016, Signal Image Video Process..