CNN-Based Bi-Directional Motion Compensation for High Efficiency Video Coding

The state-of-the-art High Efficiency Video Coding (HEVC) standard adopts the bi-prediction to improve the coding efficiency for B frame. However, the underlying assumption of this technique is that the motion field is characterized by the block-wise translational motion model, which may not be efficient in the challenging scenarios such as rotation and deformation. Inspired by the excellent signal level prediction capability of deep learning, we propose a bi-directional motion compensation algorithm with convolutional neural network, which is further incorporated into the video coding pipeline to improve the performance of video compression. Our network consists of six convolutional layers and a skip connection, which integrates the prediction error detection and non-linear signal prediction into an end-to-end framework. Experimental results show that by incorporating the proposed scheme into HEVC, up to 10.5% BD-rate savings and 3.1% BD-rate savings on average for random access (RA) configuration have been observed.

[1]  Wen Gao,et al.  Low-Rank Decomposition-Based Restoration of Compressed Images via Adaptive Noise Estimation , 2016, IEEE Transactions on Image Processing.

[2]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Dong Liu,et al.  A convolutional neural network approach for half-pel interpolation in video coding , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[4]  Peng Yin,et al.  Localized weighted prediction for video coding , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[5]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hiroshi Watanabe,et al.  Global brightness-variation compensation for video coding , 1998, IEEE Trans. Circuits Syst. Video Technol..

[8]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[9]  Xinfeng Zhang,et al.  Spatial-temporal residue network based in-loop filter for video coding , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Wen Gao,et al.  Low-Rank-Based Nonlocal Adaptive Loop Filter for High-Efficiency Video Compression , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Wei Zhang,et al.  The SJTU 4K video sequence dataset , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[17]  Wen Gao,et al.  Video Compression Artifact Reduction via Spatio-Temporal Multi-Hypothesis Prediction , 2015, IEEE Transactions on Image Processing.

[18]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Houqiang Li,et al.  An affine motion compensation framework for high efficiency video coding , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).