Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning

Inspired by the facts that retinal cells actually segregate the visual scene into different attributes (e.g., spatial details, temporal motion) for respective neuronal processing, we propose to first decompose the input video into respective spatial texture frames (STF) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMF) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic resampling in decomposition and HEVC compliant codec in compression, and puts the focus on the synthesis part. For resolution-adaptive synthesis, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a non-local texture transfer network (NL-TTN) to better augment spatial details, by which the compression and resolution resampling noises can be effectively alleviated with better rate-distortion efficiency. Such "Decomposition, Compression, Synthesis (DCS)" based scheme is codec agnostic, currently exemplifying averaged $\approx$1 dB PSNR gain or $\approx$25% BD-rate saving, against the HEVC anchor using reference software. In addition, experimental comparisons to the state-of-the-art methods and ablation studies are conducted to further report the efficiency and generalization of DCS algorithm, promising an encouraging direction for future video coding.

[1]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[3]  Aggelos K. Katsaggelos,et al.  Region-based super-resolution for compression , 2007, Multidimens. Syst. Signal Process..

[4]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Zhan Ma,et al.  DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[7]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yiling Xu,et al.  A Dual Camera System for High Spatiotemporal Resolution Video Acquisition , 2020, IEEE transactions on pattern analysis and machine intelligence.

[9]  Zhan Ma,et al.  Variable Bitrate Image Compression with Quality Scaling Factors , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ci Wang,et al.  Down-Sampling Based Video Coding Using Super-Resolution Technique , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Gary J. Sullivan,et al.  Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[15]  Zulin Wang,et al.  A Deep Learning Approach for Multi-Frame In-Loop Filter of HEVC , 2019, IEEE Transactions on Image Processing.

[16]  Zhan Ma,et al.  Learned Video Compression via Joint Spatial-Temporal Correlation Exploration , 2019, AAAI.

[17]  Jinjia Zhou,et al.  Down-Sampling Based Video Coding with Degradation-Aware Restoration-Reconstruction Deep Neural Network , 2020, MMM.

[18]  Wei Zhang,et al.  The SJTU 4K video sequence dataset , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[19]  Li Wang,et al.  Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement , 2020, AAAI.

[20]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Jungwon Lee,et al.  Variable Rate Deep Image Compression With a Conditional Autoencoder , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Hairong Qi,et al.  Image Super-Resolution by Neural Texture Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Baining Guo,et al.  Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yao Wang,et al.  Neural Video Coding Using Multiscale Motion Compensation and Spatiotemporal Context Model , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Heiko Schwarz,et al.  Analysis of Hierarchical B Pictures and MCTF , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[29]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).