Deep Virtual Reference Frame Generation For Multiview Video Coding

Multiview video has a large amount of data which brings great challenges to both the storage and transmission. Thus, it is essential to increase the compression efficiency of multiview video coding. In this paper, a deep virtual reference frame generation method is proposed to improve the performance of multiview video coding. Specifically, a parallax-guided generation network (PGG-Net) is designed to transform the parallax relation between different viewpoints and generate a high-quality virtual reference frame. In the network, a multilevel receptive field module is designed to enlarge the receptive field and extract the multi-scale deep features. After that, a parallax attention fusion module is used to transform the parallax and merge the features. The proposed method is integrated into the platform of 3D-HEVC and the generated virtual reference frame is inserted into the reference picture list as an additional reference. Experimental results show that the proposed method achieves 5.31% average BD-rate reduction compared to the 3D-HEVC.

[1]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[2]  Wen Gao,et al.  Enhanced Motion-Compensated Video Coding With Deep Virtual Reference Frame Generation , 2019, IEEE Transactions on Image Processing.

[3]  Li Chen,et al.  Disparity-compensated inter-layer motion prediction using standardized HEVC extensions , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[4]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[5]  Lai-Man Po,et al.  Horizontal Scaling and Shearing-Based Disparity-Compensated Prediction for Stereo Video Coding , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Wei An,et al.  Learning Parallax Attention for Stereo Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jianjun Lei,et al.  Fast Mode Decision Based on Grayscale Similarity and Inter-View Correlation for Depth Map Coding in 3D-HEVC , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Shuai Li,et al.  Hole Filling With Multiple Reference Views in DIBR View Synthesis , 2018, IEEE Transactions on Multimedia.

[9]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dong Liu,et al.  Convolutional Neural Network-Based Block Up-Sampling for Intra Frame Coding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Ying Chen,et al.  Overview of the Multiview and 3D Extensions of High Efficiency Video Coding , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Xinfeng Zhang,et al.  Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding , 2019, IEEE Transactions on Image Processing.

[13]  Shuai Li,et al.  Depth Coding Based on Depth-Texture Motion and Structure Similarities , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Li Li,et al.  Convolutional Neural Network-Based Fractional-Pixel Motion Compensation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Yun Zhang,et al.  Rate Distortion Optimized Inter-View Frame Level Bit Allocation Method for MV-HEVC , 2015, IEEE Transactions on Multimedia.