Deep Reference Generation With Multi-Domain Hierarchical Constraints for Inter Prediction

Inter prediction is an important module in video coding for temporal redundancy removal, where similar reference blocks are searched from previously coded frames and employed to predict the block to be coded. Although existing video codecs can estimate and compensate for block-level motions, their inter prediction performance is still heavily affected by the remaining inconsistent pixel-wise displacement caused by irregular rotation and deformation. In this paper, we address the problem by proposing a deep frame interpolation network to generate additional reference frames in coding scenarios. First, we summarize the previous adaptive convolutions used for frame interpolation and propose a factorized kernel convolutional network to improve the modeling capacity and simultaneously keep its compact form. Second, to better train this network, multi-domain hierarchical constraints are introduced to regularize the training of our factorized kernel convolutional network. For spatial domain, we use a gradually down-sampled and up-sampled auto-encoder to generate the factorized kernels for frame interpolation at different scales. For quality domain, considering the inconsistent quality of the input frames, the factorized kernel convolution is modulated with quality-related features to learn to exploit more information from high quality frames. For frequency domain, a sum of absolute transformed difference loss that performs frequency transformation is utilized to facilitate network optimization from the view of coding performance. With the well-designed frame interpolation network regularized by multi-domain hierarchical constraints, our method surpasses HEVC on average $\text{3.8}\%$ BD-rate saving for the luma component under the random access configuration and also obtains on average $\text{0.83}\%$ BD-rate saving over the upcoming VVC.

[1]  Xinfeng Zhang,et al.  Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding , 2019, IEEE Transactions on Image Processing.

[2]  Alexander Alshin,et al.  Bi-directional optical flow for improving motion compensation , 2010, 28th Picture Coding Symposium.

[3]  Jörn Ostermann,et al.  Deep learning-based intra prediction mode decision for HEVC , 2016, 2016 Picture Coding Symposium (PCS).

[4]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Alexander Alshin,et al.  Bi-directional Pptical Flow for Future Video Codec , 2016, 2016 Data Compression Conference (DCC).

[6]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[7]  Xuan Jing,et al.  An efficient three-step search algorithm for block motion estimation , 2004, IEEE Transactions on Multimedia.

[8]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Jingning Han,et al.  Co-located Reference Frame Interpolation Using Optical Flow Estimation for Video Compression , 2018, 2018 Data Compression Conference.

[11]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jan Kautz,et al.  Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Zulin Wang,et al.  Reducing Complexity of HEVC: A Deep Learning Approach , 2017, IEEE Transactions on Image Processing.

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Wen Gao,et al.  Neural Network Based Inter Prediction for HEVC , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Dong Liu,et al.  A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding , 2016, MMM.

[18]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[19]  Jiangtao Wen,et al.  SATD Based Fast Intra Prediction for HEVC , 2017, 2017 Data Compression Conference (DCC).

[20]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[21]  Wei Wang,et al.  Deep Learning for Single Image Super-Resolution: A Brief Review , 2018, IEEE Transactions on Multimedia.

[22]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[24]  José Luís Almada Güntzel,et al.  Rate-constrained successive elimination of Hadamard-based SATDs , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Wen Gao,et al.  Enhanced Ctu-Level Inter Prediction with Deep Frame Rate Up-Conversion for High Efficiency Video Coding , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[27]  Bin Li,et al.  Fully Connected Network-Based Intra Prediction for Image Coding , 2018, IEEE Transactions on Image Processing.

[28]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[30]  Zulin Wang,et al.  Fast H.264 to HEVC Transcoding: A Deep Learning Method , 2019, IEEE Transactions on Multimedia.

[31]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Gang Wang,et al.  Recurrent Spatial Pyramid CNN for Optical Flow Estimation , 2018, IEEE Transactions on Multimedia.

[34]  Sungjei Kim,et al.  Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[35]  Jörn Ostermann,et al.  HEVC Inter Coding using Deep Recurrent Neural Networks and Artificial Reference Pictures , 2018, 2019 Picture Coding Symposium (PCS).

[36]  Dong Liu,et al.  A convolutional neural network approach for half-pel interpolation in video coding , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[37]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[39]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[41]  Zhenyu Liu,et al.  CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network , 2016, IEEE Transactions on Image Processing.

[42]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Dong Liu,et al.  One-for-All: Grouped Variation Network-Based Fractional Interpolation in Video Coding , 2019, IEEE Transactions on Image Processing.