Compressed-domain correlates of human fixations in dynamic scenes

In this paper we present two compressed-domain features that are highly indicative of saliency in natural video. We demonstrate the potential of these two features to indicate saliency by comparing their statistics around human fixation points against their statistics at control points away from fixations. Then, using these features, we construct a simple and effective saliency estimation method for compressed video, which utilizes only motion vectors, block coding modes and coded residuals from the bitstream, with partial decoding. The proposed algorithm has been extensively tested on two ground truth datasets using several accuracy metrics. The results indicate its superior performance over several state-of-the-art compressed-domain and pixel-domain algorithms for saliency estimation.

[1]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[2]  Zhe-Ming Lu,et al.  Video abstraction based on the visual attention model and online clustering , 2013, Signal Process. Image Commun..

[3]  Ivan V. Bajic,et al.  Compressed-Domain Correlates of Fixations in Video , 2014, PIVP '14.

[4]  Zhi Liu,et al.  A Motion Attention Model Based Rate Control Algorithm for H.264/AVC , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[5]  Ivan V. Bajic,et al.  Attention Retargeting by Color Manipulation in Images , 2014, PIVP '14.

[6]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[7]  Ivan V. Bajic,et al.  Eye-Tracking Database for a Set of Standard Video Sequences , 2012, IEEE Transactions on Image Processing.

[8]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[9]  Aniruddha Sinha,et al.  A fast algorithm to find the region-of-interest in the compressed MPEG domain , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  Deepu Rajan,et al.  Salient Motion Detection in Compressed Domain , 2013, IEEE Signal Processing Letters.

[11]  Laurent Itti,et al.  Realistic avatar eye and head animation using a neurobiological model of visual attention , 2004, SPIE Optics + Photonics.

[12]  Christof Koch,et al.  Feature combination strategies for saliency-based visual attention systems , 2001, J. Electronic Imaging.

[13]  Nuno Vasconcelos,et al.  Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ivan V. Bajic,et al.  Comparison of visual saliency models for compressed video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[15]  Gene Cheung,et al.  Video Error Concealment Using a Computation-Efficient Low Saliency Prior , 2013, IEEE Transactions on Multimedia.

[16]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[17]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[18]  A. Mizuno,et al.  A change of the leading player in flow Visualization technique , 2006, J. Vis..

[19]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[20]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[21]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[22]  Ivan V. Bajic,et al.  Video Watermarking With Empirical PCA-Based Decoding , 2013, IEEE Transactions on Image Processing.

[23]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[24]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[25]  Alan C. Bovik,et al.  Visual Importance Pooling for Image Quality Assessment , 2009, IEEE Journal of Selected Topics in Signal Processing.

[26]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[28]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[29]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[30]  ChenZhenzhong,et al.  A Video Saliency Detection Model in Compressed Domain , 2014 .

[31]  Wonjun Kim,et al.  Spatiotemporal Saliency Detection and Its Applications in Static and Dynamic Scenes , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Ido Dagan,et al.  Similarity-Based Methods for Word Sense Disambiguation , 1997, ACL.

[33]  Ahmet M. Kondoz,et al.  Global motion estimation using variable block sizes and its application to object segmentation , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[34]  Antón García-Díaz,et al.  Saliency from hierarchical adaptation through decorrelation and variance normalization , 2012, Image Vis. Comput..

[35]  N. Vasconcelos,et al.  Biologically plausible saliency mechanisms improve feedforward object recognition , 2010, Vision Research.

[36]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Aniruddha Sinha,et al.  Region-of-interest based compressed domain video transcoding scheme , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[40]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[41]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[42]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[43]  Peter Neri,et al.  Nonlinear characterization of a simple process in human vision. , 2009, Journal of vision.

[44]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[45]  Klaus Sattler,et al.  Principles and methods , 2011 .

[46]  J. Cornell Introductory Mathematical Statistics: Principles and Methods , 1970 .

[47]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[48]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[49]  HongJiang Zhang,et al.  A new perceived motion based shot content representation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).