Learned Video Compression with Feature-level Residuals

In this paper, we present an end-to-end video compression network for P-frame challenge on CLIC. We focus on deep neural network (DNN) based video compression, and improve the current frameworks from three aspects. First, we notice that pixel space residuals is sensitive to the prediction errors of optical flow based motion compensation. To suppress the relative influence, we propose to compress the residuals of image feature rather than the residuals of image pixels. Furthermore, we combine the advantages of both pixel-level and feature-level residual compression methods by model ensembling. Finally, we propose a step-by-step training strategy to improve the training efficiency of the whole framework. Experiment results indicate that our proposed method achieves 0.9968 MS-SSIM on CLIC validation set and 0.9967 MS-SSIM on test set.

[1]  Dong-Wook Kim,et al.  GRDN:Grouped Residual Dense Network for Real Image Denoising and GAN-Based Real-World Noise Modeling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[3]  Jiro Katto,et al.  Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[5]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[6]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[9]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Feng Wu,et al.  Learning for Video Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.