A Group Variational Transformation Neural Network for Fractional Interpolation of Video Coding

Motion compensation is an important technology in video coding to remove the temporal redundancy between coded video frames. In motion compensation, fractional interpolation is used to obtain more reference blocks at sub-pixel level. Existing video coding standards commonly use fixed interpolation filters for fractional interpolation, which are not efficient enough to handle diverse video signals well. In this paper, we design a group variational transformation convolutional neural network (GVTCNN) to improve the fractional interpolation performance of the luma component in motion compensation. GVTCNN infers samples at different sub-pixel positions from the input integer-position sample. It first extracts a shared feature map from the integer-position sample to infer various sub-pixel position samples. Then a group variational transformation technique is used to transform a group of copied shared feature maps to samples at different sub-pixel positions. Experimental results have identified the interpolation efficiency of our GVTCNN. Compared with the interpolation method of High Efficiency Video Coding, our method achieves 1.9% bit saving on average and up to 5.6% bit saving under low-delay P configuration.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[3]  Zhang Luyao,et al.  Real-time deep image super-resolution via global context aggregation and local queue jumping , 2017 .

[4]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[6]  Dong Liu,et al.  A convolutional neural network approach for half-pel interpolation in video coding , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[7]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[8]  Shuicheng Yan,et al.  Deep Edge Guided Recurrent Residual Learning for Image Super-Resolution , 2016, IEEE Transactions on Image Processing.

[9]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wenhan Yang,et al.  Variation learning guided convolutional network for image interpolation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[11]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).