Compressed-Domain Correlates of Fixations in Video

In this paper we present two compressed-domain features that are highly indicative of saliency in natural video. Their potential to predict saliency is demonstrated by comparing their statistics around human fixation points in a number of videos against the control points selected randomly away from fixations. Using these features, we construct a simple and effective saliency estimation method for compressed video, which utilizes only motion vectors, block coding modes and coded residuals from the bitstream, with partial decoding. The proposed algorithm has been extensively tested on two ground truth datasets using several accuracy metrics. The results indicate its superior performance over several state-of-the-art compressed-domain and pixel-domain algorithms for saliency estimation.

[1]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4]  Ivan V. Bajic,et al.  Eye-Tracking Database for a Set of Standard Video Sequences , 2012, IEEE Transactions on Image Processing.

[5]  Alan C. Bovik,et al.  Visual Importance Pooling for Image Quality Assessment , 2009, IEEE Journal of Selected Topics in Signal Processing.

[6]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[8]  Aniruddha Sinha,et al.  Region-of-interest based compressed domain video transcoding scheme , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Aniruddha Sinha,et al.  A fast algorithm to find the region-of-interest in the compressed MPEG domain , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[11]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[12]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[13]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[14]  HongJiang Zhang,et al.  A new perceived motion based shot content representation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[15]  Wonjun Kim,et al.  Spatiotemporal Saliency Detection and Its Applications in Static and Dynamic Scenes , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[17]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[18]  Ahmet M. Kondoz,et al.  Global motion estimation using variable block sizes and its application to object segmentation , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[19]  Antón García-Díaz,et al.  Saliency from hierarchical adaptation through decorrelation and variance normalization , 2012, Image Vis. Comput..

[20]  N. Vasconcelos,et al.  Biologically plausible saliency mechanisms improve feedforward object recognition , 2010, Vision Research.

[21]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Deepu Rajan,et al.  Salient Motion Detection in Compressed Domain , 2013, IEEE Signal Processing Letters.

[23]  J. Cornell Introductory Mathematical Statistics: Principles and Methods , 1970 .

[24]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[25]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[26]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[27]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[28]  Nuno Vasconcelos,et al.  Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[30]  A. Tamhane,et al.  Multiple Comparison Procedures , 2009 .

[31]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[32]  Zhe-Ming Lu,et al.  Video abstraction based on the visual attention model and online clustering , 2013, Signal Process. Image Commun..

[33]  Zhi Liu,et al.  A Motion Attention Model Based Rate Control Algorithm for H.264/AVC , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[34]  Ivan V. Bajic,et al.  Comparison of visual saliency models for compressed video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).