Learning for Video Compression

One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

[1]  Jianfeng Xu,et al.  Fast integer-pel and fractional-pel motion estimation for H.264/AVC , 2006, J. Vis. Commun. Image Represent..

[2]  J. B. O'Neal,et al.  Predictive quantizing systems (differential pulse code modulation) for the transmission of television signals , 1966 .

[3]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jani Lainema,et al.  Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  Dong Liu,et al.  Neural network-based arithmetic coding of intra prediction modes in HEVC , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[6]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[7]  Tingting Wang,et al.  A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC , 2017, 2017 Data Compression Conference (DCC).

[8]  Nir Shavit,et al.  Generative Compression , 2017, 2018 Picture Coding Symposium (PCS).

[9]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[10]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[11]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[12]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[13]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Ole Winther,et al.  Recurrent Spatial Transformer Networks , 2015, ArXiv.

[15]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[16]  Bingbing Ni,et al.  Unsupervised Deep Learning for Optical Flow Estimation , 2017, AAAI.

[17]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[18]  Wuzhen Shi,et al.  An End-to-End Compression Framework Based on Convolutional Neural Networks , 2017, 2017 Data Compression Conference (DCC).

[19]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[20]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[23]  David Minnen,et al.  Variable Rate Image Compression with Recurrent Neural Networks , 2015, ICLR.

[24]  Dong Liu,et al.  Convolutional Neural Network-Based Block Up-Sampling for Intra Frame Coding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[26]  Mai Xu,et al.  A deep convolutional neural network approach for complexity reduction on intra-mode HEVC , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Junjie Liu,et al.  VLSI friendly fast CU/PU mode decision for HEVC intra encoding: Leveraging convolution neural network , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[28]  Aline Roumy,et al.  Image compression with Stochastic Winner-Take-All Auto-Encoder , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wei Zhou,et al.  Augmented Coarse-to-Fine Video Frame Synthesis with Semantic Loss , 2018, PRCV.

[31]  Zhibo Chen,et al.  Learning based Facial Image Compression with Semantic Fidelity Metric , 2018, Neurocomputing.

[32]  A. Habibi Hybrid Coding of Pictorial Data , 1974, IEEE Trans. Commun..

[33]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[34]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[35]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36]  Dong Liu,et al.  A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding , 2016, MMM.

[37]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Gary J. Sullivan,et al.  Video Compression - From Concepts to the H.264/AVC Standard , 2005, Proceedings of the IEEE.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Vladlen Koltun,et al.  Learning to Inpaint for Image Compression , 2017, NIPS.

[41]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[42]  James A. Storer,et al.  Semantic Perceptual Image Compression Using Deep Convolution Networks , 2016, 2017 Data Compression Conference (DCC).

[43]  Dong Liu,et al.  A convolutional neural network approach for half-pel interpolation in video coding , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[44]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Allen Gersho,et al.  Variable block-size image coding , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Tapani Raiko,et al.  Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.