论文信息 - MOVI-Codec: Deep Video Compression without Motion

MOVI-Codec: Deep Video Compression without Motion

Traditional video codecs follow the predictive coding architecture of motion-compensated prediction and residual transform coding. Inspired by recent advances in deep learning, we propose a new deep learning video compression architecture that does not require motion estimation, which is the most expensive component in traditional video codecs. Our network consists of three components: a Displacement Calculation Unit (DCU), a Displacement Compression Network (DCN), and a Frame Reconstruction Network (FRN). The DCU exploits displaced frame differences as motion information, thus removing the need for motion estimation found in hybrid codecs. DCN utilizes an RNN-based network to learn temporal dependencies between frames. In the FRN, a new version of the UNet model, called LSTM-UNet is proposed and utilized to learn space-time differential representations of the videos. Our experimental results show that our compression model, MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion and outperforms the video coding standard H.264 and exceeds the performance of the modern global standard HEVC codec as measured by MS-SSIM, especially on higher resolution videos.

[1] Jiajun Wu,et al. Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[2] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[3] Jongho Kim,et al. On the space-time statistics of motion pictures. , 2021, Journal of the Optical Society of America. A, Optics, image science, and vision.

[4] L. Gool,et al. Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Kai-Kuang Ma,et al. A new diamond search algorithm for fast block-matching motion estimation , 2000, IEEE Trans. Image Process..

[6] J. Victor,et al. The unsteady eye: an information-processing stage, not a bug , 2015, Trends in Neurosciences.

[7] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[8] Rajiv Soundararajan,et al. Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Xiaoyun Zhang,et al. DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Luc Van Gool,et al. Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Lubomir D. Bourdev,et al. Real-Time Adaptive Image Compression , 2017, ICML.

[12] J. Atick,et al. Temporal decorrelation: a theory of lagged and nonlagged responses in the lateral geniculate nucleus , 1995 .

[13] Chao-Yuan Wu,et al. Video Compression through Image Interpolation , 2018, ECCV.

[14] David Minnen,et al. Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[15] David Minnen,et al. Variational image compression with a scale hyperprior , 2018, ICLR.

[16] David Minnen,et al. Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[17] David Minnen,et al. Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Li Chen,et al. An End-to-End Learning Framework for Video Compression , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Jiro Katto,et al. Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Valero Laparra,et al. End-to-end Optimized Image Compression , 2016, ICLR.

[21] Joseph J. Atick,et al. Towards a Theory of Early Visual Processing , 1990, Neural Computation.