Spatiotemporal visual saliency guided perceptual high efficiency video coding with neural network

Abstract The perceptual video coding systems for optimization have been developed on the basis of different attributes of the human visual system. The attention-based coding system is considered as an important part of it. The saliency map method representing the region-of-interest (ROI) from the video signal has become a reliable method due to advances in the computer performance and the visual algorithms. In the present study, we propose a hybrid compression algorithm that uses the deep convolutional neural network to compute the spatial saliency followed by extraction of the temporal saliency from the compressed-domain motion information. The level of uncertainty is calculated to combine to form the video's saliency map. Afterwards, the QP search range is dynamically adjusted in HEVC, and a rate distortion calculation method is proposed to choose the pattern and guide the allocation of bits during the video compression process. Empirical reporting results proved the superiority of the proposed method over the state-of-the-art perceptual coding algorithms in terms of saliency detection and perceptual compression quality.

[1]  King Ngi Ngan,et al.  Perceptual video coding: Challenges and approaches , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[2]  Laurent Itti,et al.  Visual attention guided bit allocation in video compression , 2011, Image Vis. Comput..

[3]  Touradj Ebrahimi,et al.  Efficient video coding based on audio-visual focus of attention , 2011, J. Vis. Commun. Image Represent..

[4]  Touradj Ebrahimi,et al.  Perceptual Video Compression: A Survey , 2012, IEEE Journal of Selected Topics in Signal Processing.

[5]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Zhengguo Li,et al.  Region-of-Interest Based Resource Allocation for Conversational Video Communication of H.264/AVC , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Shuai Li,et al.  Lagrangian Multiplier Adaptation for Rate-Distortion Optimization With Inter-Frame Dependency , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  M. M. Taylor,et al.  PEST: Efficient Estimates on Probability Functions , 1967 .

[9]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Chia-Hung Yeh,et al.  Region-of-interest video coding based on rate and distortion variations for H.263+ , 2008, Signal Process. Image Commun..

[11]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Touradj Ebrahimi,et al.  Semantic video analysis for adaptive content delivery and automatic description , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Zhou Wang,et al.  Video quality assessment using a statistical model of human visual speed perception. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[14]  Qingming Huang,et al.  Visual perception based Lagrangian rate distortion optimization for video coding , 2011, 2011 18th IEEE International Conference on Image Processing.

[15]  Ivan V. Bajic,et al.  Motion Vector Outlier Rejection Cascade for Global Motion Estimation , 2010, IEEE Signal Processing Letters.

[16]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[17]  Antonio Torralba,et al.  Mapping human visual representations in space and time by neural networks. , 2015, Journal of vision.

[18]  Jitendra Malik,et al.  Pixels to Voxels: Modeling Visual Representation in the Human Brain , 2014, ArXiv.

[19]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[20]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[21]  Touradj Ebrahimi,et al.  Subjective Quality Evaluation of Foveated Video Coding Using Audio-Visual Focus of Attention , 2011, IEEE Journal of Selected Topics in Signal Processing.

[22]  Ivan V. Bajic,et al.  Eye-Tracking Database for a Set of Standard Video Sequences , 2012, IEEE Transactions on Image Processing.

[23]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[24]  Anthony J. Maeder,et al.  Analysing inter-observer saliency variations in task-free viewing of natural images , 2010, 2010 IEEE International Conference on Image Processing.

[25]  Houqiang Li,et al.  A Rate-Distortion Optimized Coding Method for Region of Interest in Scalable Video Coding , 2015, Adv. Multim..

[26]  L. Itti Author address: , 1999 .

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Wonha Kim,et al.  Video Processing for Human Perceptual Visual Quality-Oriented Video Coding , 2013, IEEE Transactions on Image Processing.

[29]  Chia-Hung Yeh,et al.  Robust Region-of-Interest Determination Based on User Attention Model Through Visual Rhythm Analysis , 2007, 2007 16th International Conference on Computer Communications and Networks.

[30]  Zhou Wang,et al.  Video saliency incorporating spatiotemporal cues and uncertainty weighting , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[32]  Shengxi Li,et al.  A novel weight-based URQ scheme for perceptual video coding of conversational video in HEVC , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[33]  Gary J. Sullivan,et al.  Rate-distortion optimization for video compression , 1998, IEEE Signal Process. Mag..

[34]  R. Abrams,et al.  Motion Onset Captures Attention , 2003, Psychological science.

[35]  Ming-Ting Sun,et al.  Global motion estimation from coarsely sampled motion vector field and the applications , 2005, IEEE Trans. Circuits Syst. Video Technol..

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Satoshi Goto,et al.  Region-of-interest based dynamical parameter allocation for H.264/AVC encoder , 2009, 2009 Picture Coding Symposium.