Memory-Augmented Auto-Regressive Network for Frame Recurrent Inter Prediction

Inter prediction is quite important for the modern codecs to remove temporal redundancy. In this paper, we make endeavors in generating artificial reference frames with previous reconstructed frames for inter prediction, to offer a better choice when the traditional block-wise motion estimation fails to find a good reference block. Long-term temporal dynamics are tracked during the whole coding process to generate more accurate and realistic artificial reference frames. Specifically, we propose a Memory-Augmented Auto-Regressive Network (MAAR-Net) for frame prediction in video coding. MAAR-Net regresses the current frame with two nearest frames via an auto-regressive (AR) model to better capture the main spatial and temporal structures. The AR regression coefficients are generated based on adjacent frame information as well as the long-term motion dynamics accumulated and propagated by a convolutional Long Short-Term Memory (LSTM). To generate the target frame with higher quality, a quality attention mechanism is introduced for the temporal regularization between different reconstructed frames. With the well-designed network, our method surpasses HEVC on average 4.0% BD-rate saving and up to 10.6% BD-rate saving for the luma component under the low-delay configuration.

[1]  Zhiwei Xiong,et al.  Image/Video Restoration via Multiplanar Autoregressive Model and Low-Rank Optimization , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[2]  Wenhan Yang,et al.  Deep Reference Generation With Multi-Domain Hierarchical Constraints for Inter Prediction , 2019, IEEE Transactions on Multimedia.

[3]  Dong Liu,et al.  One-for-All: Grouped Variation Network-Based Fractional Interpolation in Video Coding , 2019, IEEE Transactions on Image Processing.

[4]  Ivan V. Bajic,et al.  Deep Frame Prediction for Video Coding , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Jörn Ostermann,et al.  HEVC Inter Coding using Deep Recurrent Neural Networks and Artificial Reference Pictures , 2018, 2019 Picture Coding Symposium (PCS).

[6]  Dong Liu,et al.  Generative Adversarial Network-Based Frame Extrapolation for Video Coding , 2018, 2018 IEEE Visual Communications and Image Processing (VCIP).

[7]  Wenhan Yang,et al.  Progressive Spatial Recurrent Neural Network for Intra Prediction , 2018, IEEE Transactions on Multimedia.

[8]  Jiaying Liu,et al.  Dmcnn: Dual-Domain Multi-Scale Convolutional Neural Network for Compression Artifacts Removal , 2018, International Conference on Information Photonics.

[9]  Wenhan Yang,et al.  Isophote-Constrained Autoregressive Model With Adaptive Window Extension for Image Interpolation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Wen-Huang Cheng,et al.  Enhanced Intra Prediction with Recurrent Neural Network in Video Coding , 2018, 2018 Data Compression Conference.

[11]  Siwei Ma,et al.  A Group Variational Transformation Neural Network for Fractional Interpolation of Video Coding , 2018, 2018 Data Compression Conference.

[12]  W. Freeman,et al.  Video Enhancement with Task-Oriented Flow , 2017, International Journal of Computer Vision.

[13]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Raymond A. Yeh,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[17]  Zhiwei Xiong,et al.  MARLow: A Joint Multiplanar Autoregressive and Low-Rank Approach for Image Completion , 2016, ECCV.

[18]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[19]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[20]  Jie Ren,et al.  Adaptive General Scale Interpolation Based on Weighted Autoregressive Models , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Jie Ren,et al.  Similarity modulated block estimation for image interpolation , 2011, 2011 18th IEEE International Conference on Image Processing.

[24]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..