A Combined Deep Learning based End-to-End Video Coding Architecture for YUV Color Space

Most of the existing deep learning based end-to-end video coding (DLEC) architectures are designed specifically for RGB color format, yet the video coding standards, including H.264/AVC, H.265/HEVC and H.266/VVC developed over past few decades, have been designed primarily for YUV 4:2:0 format, where the chrominance (U and V) components are subsampled to achieve superior compression performances considering the human visual system. While a broad number of papers on DLEC compare these two distinct coding schemes in RGB domain, it is ideal to have a common evaluation framework in YUV 4:2:0 domain for a more fair comparison. This paper introduces a new DLEC architecture for video coding to effectively support YUV 4:2:0 and compares its performance against the HEVC standard under a common evaluation framework. The experimental results on YUV 4:2:0 video sequences show that the proposed architecture can outperform HEVC in intra-frame coding, however inter-frame coding is not as efficient on contrary to the RGB coding results reported in recent papers.

[1]  William A. Pearlman,et al.  Digital Signal Compression: Principles and Practice , 2011 .

[2]  Akshay Pushparaja,et al.  CompressAI: a PyTorch library and evaluation platform for end-to-end compression research , 2020, ArXiv.

[3]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[5]  Johannes Ballé,et al.  Efficient Nonlinear Transforms for Lossy Image Compression , 2018, 2018 Picture Coding Symposium (PCS).

[6]  Marta Karczewicz,et al.  Transform Network Architectures for Deep Learning Based End-to-End Image/Video Coding in Subsampled Color Spaces , 2021, IEEE Open Journal of Signal Processing.

[7]  A. Tekalp,et al.  End-to-End Rate-Distortion Optimization for Bi-Directional Learned Video Compression , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[8]  Jarek Duda,et al.  Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding , 2013, 1311.2540.

[9]  David Minnen,et al.  Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[10]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  A. Murat Tekalp,et al.  Can learned frame prediction compete with block motion compensation for video coding? , 2020, Signal Image Video Process..

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).