论文信息 - Learning to Compress Videos without Computing Motion

Learning to Compress Videos without Computing Motion

With the development of higher resolution contents and displays, its significant volume poses significant challenges to the goals of acquiring, transmitting, compressing and displaying high quality video content. In this paper, we propose a new deep learning video compression architecture that does not require motion estimation, which is the most expensive element of modern hybrid video compression codecs like H.264 and HEVC. Our framework exploits the regularities inherent to video motion, which we capture by using displaced frame differences as video representations to train the neural network. In addition, we propose a new space-time reconstruction network based on both an LSTM model and a UNet model, which we call LSTM-UNet. The combined network is able to efficiently capture both temporal and spatial video information, making it highly amenable for our purposes. The new video compression framework has three components: a Displacement Calculation Unit (DCU), a Displacement Compression Network (DCN), and a Frame Reconstruction Network (FRN), all of which are jointly optimized against a single perceptual loss function. The DCU obviates the need for motion estimation as in hybrid codecs, and is less expensive. In the DCN, an RNN-based network is utilized to conduct variable bit-rate encoding based on a single round of training. The LSTM-UNet is used in the FRN to learn space time differential representations of videos. Our experimental results show that our compression model, which we call the MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion. Our experiments show that MOVI-Codec outperforms the video coding standard H.264, and is highly competitive with, and sometimes exceeds the performance of the modern global standard HEVC codec, as measured by MS-SSIM.

[1] Rajiv Soundararajan,et al. Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[2] E. Chichilnisky,et al. Functional Asymmetries in ON and OFF Ganglion Cells of Primate Retina , 2002, The Journal of Neuroscience.

[3] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[4] Luc Van Gool,et al. Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Lubomir D. Bourdev,et al. Real-Time Adaptive Image Compression , 2017, ICML.

[6] Pyeong Gang Heo,et al. A new motion estimation method for motion-compensated frame interpolation using a convolutional neural network , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[7] Ajay Luthra,et al. Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8] Steve Branson,et al. Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Gregory K. Wallace,et al. The JPEG still picture compression standard , 1991, CACM.

[10] Alan C. Bovik,et al. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[11] J. Atick,et al. Temporal decorrelation: a theory of lagged and nonlagged responses in the lateral geniculate nucleus , 1995 .

[12] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[14] Lucas Theis,et al. Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[15] Feng Wu,et al. Learning for Video Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[16] R. Manmatha,et al. Deep Perceptual Compression , 2019, ArXiv.

[17] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[18] Andrew Zisserman,et al. A Short Note about Kinetics-600 , 2018, ArXiv.

[19] Luc Van Gool,et al. Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Debargha Mukherjee,et al. The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[21] David Minnen,et al. Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Luc Van Gool,et al. Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Luc Van Gool,et al. Practical Full Resolution Learned Lossless Image Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] David Minnen,et al. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] David Minnen,et al. Joint Autoregressive and Hierarchical Priors for Learned Image Compression , 2018, NeurIPS.

[26] Jooyoung Lee,et al. Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[27] F. Attneave. Some informational aspects of visual perception. , 1954, Psychological review.

[28] Zhan Ma,et al. DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[29] Sungoh Kim,et al. A novel fast and low-complexity Motion Estimation for UHD HEVC , 2013, 2013 Picture Coding Symposium (PCS).

[30] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[31] Li Chen,et al. An End-to-End Learning Framework for Video Compression , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Kai-Kuang Ma,et al. A new diamond search algorithm for fast block-matching motion estimation , 2000, IEEE Trans. Image Process..

[34] Valero Laparra,et al. End-to-end Optimized Image Compression , 2016, ICLR.

[35] Yochai Blau,et al. Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff , 2019, ICML.

[36] David Minnen,et al. Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[37] Joseph J. Atick,et al. Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[38] Taco S. Cohen,et al. Video Compression With Rate-Distortion Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Ralf Engbert. Microsaccades: A microcosm for research on oculomotor control, attention, and visual perception. , 2006, Progress in brain research.

[40] Xiaoyun Zhang,et al. DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] David Minnen,et al. Variational image compression with a scale hyperprior , 2018, ICLR.

[42] Xiaoou Tang,et al. LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] J. Victor,et al. The unsteady eye: an information-processing stage, not a bug , 2015, Trends in Neurosciences.

[44] F. Bossen,et al. Common test conditions and software reference configurations , 2010 .

[45] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[46] Gary J. Sullivan,et al. Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[47] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[48] Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[49] Jiajun Wu,et al. Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[50] Martina Poletti,et al. A compact field guide to the study of microsaccades: Challenges and functions , 2016, Vision Research.

[51] Michael J. Black,et al. Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Touradj Ebrahimi,et al. The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[53] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Chao-Yuan Wu,et al. Video Compression through Image Interpolation , 2018, ECCV.

[55] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[56] Ivan V. Bajic,et al. Deep Frame Prediction for Video Coding , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[57] Jiro Katto,et al. Learning Image and Video Compression Through Spatial-Temporal Energy Compaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).