论文信息 - Quality-Constant Per-Shot Encoding by Two-Pass Learning-based Rate Factor Prediction

Quality-Constant Per-Shot Encoding by Two-Pass Learning-based Rate Factor Prediction

Providing quality-constant streams can simultaneously guarantee user experience and prevent wasting bit-rate. In this paper, we propose a novel deep learning based two-pass encoder parameter prediction framework to decide rate factor (RF), with which encoder can output streams with constant quality. For each one-shot segment in a video, the proposed method ﬁrstly extracts spatial, temporal and pre-coding features by an ultra fast pre-process. Based on these features, a RF parameter is predicted by a deep neural network. Video encoder uses the RF to compress segment as the ﬁrst encoding pass. Then VMAF quality of the ﬁrst pass encoding is measured. If the quality doesn’t meet target, a second pass RF prediction and encoding will be performed. With the help of ﬁrst pass predicted RF and corresponding actual quality as feedback, the second pass prediction will be highly accurate. Experiments show the proposed method requires only 1.55 times encoding complexity on average, meanwhile the accuracy, that the compressed video’s actual VMAF is within ± 1 around the target VMAF, reaches 98.88%.

Yi Wang | Chunlei Cai | Xiaobo Li | Tianxiao Ye

[1] Angeliki V. Katsenou,et al. Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming , 2021, IEEE Open Journal of Signal Processing.

[2] Zhichao Zhou,et al. Predicting Rate Control Target Through A Learning Based Content Adaptive Model , 2019, 2019 Picture Coding Symposium (PCS).

[3] Zhu Li,et al. A Machine Learning Approach to Accurate Sequence-Level Rate Control Scheme for Video Coding , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[5] Anil C. Kokaram,et al. Optimizing Transcoder Quality Targets Using a Neural Network with an Embedded Bitrate Model , 2016, Visual Information Processing and Communication.

[6] Angeliki V. Katsenou,et al. Predicting video rate-distortion curves using textural features , 2016, 2016 Picture Coding Symposium (PCS).

[7] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8] Xiaoou Tang,et al. Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[9] P. Sathyanarayana,et al. Image Texture Feature Extraction Using GLCM Approach , 2013 .

[10] Gary J. Sullivan,et al. Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11] Uwe D. Hanebeck,et al. Template matching using fast normalized cross correlation , 2001, SPIE Defense + Commercial Sensing.

[12] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .