Speeding Up VP9 Intra Encoder With Hierarchical Deep Learning-Based Partition Prediction

In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning $64\times 64$ superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjøntegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[3]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[4]  Xinfeng Zhang,et al.  Enhanced Bi-Prediction With Convolutional Neural Network for High-Efficiency Video Coding , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Sungjei Kim,et al.  Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[7]  Hui Su,et al.  Machine Learning Accelerated Partition Search for Video Encoding , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[8]  Yaowu Xu,et al.  Technical overview of VP8, an open source video codec for the web , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[9]  Zhan Ma,et al.  Fast CU partition decision using machine learning for screen content compression , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[10]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Won Woo Ro,et al.  Fast CU Depth Decision for HEVC Using Neural Networks , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Debargha Mukherjee,et al.  The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[13]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[14]  Zhan Ma,et al.  DeepCoder: A deep neural network based video compression , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[15]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[16]  Alexandre Mercat,et al.  Tunable VVC Frame Partitioning Based on Lightweight Machine Learning , 2020, IEEE Transactions on Image Processing.

[17]  Huchuan Lu,et al.  Saliency Detection with Recurrent Fully Convolutional Networks , 2016, ECCV.

[18]  Yingli Tian,et al.  Multi-Level Machine Learning-based Early Termination in VP9 Partition Search , 2018, Visual Information Processing and Communication.

[19]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Mai Xu,et al.  A deep convolutional neural network approach for complexity reduction on intra-mode HEVC , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Alan C. Bovik,et al.  Image information and visual quality , 2006, IEEE Trans. Image Process..

[22]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[25]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[26]  Alexandros G. Dimakis,et al.  Adversarial Video Compression Guided by Soft Edge Detection , 2018, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Wen Gao,et al.  Neural Network Based Inter Prediction for HEVC , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[29]  Zulin Wang,et al.  Reducing Complexity of HEVC: A Deep Learning Approach , 2017, IEEE Transactions on Image Processing.

[30]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[32]  Qionghai Dai,et al.  Residual Highway Convolutional Neural Networks for in-loop Filtering in HEVC , 2018, IEEE Transactions on Image Processing.

[33]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael Bain,et al.  B-CNN: Branch Convolutional Neural Network for Hierarchical Classification , 2017, ArXiv.

[35]  Rajiv Soundararajan,et al.  Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Yue Chen,et al.  An Overview of Core Coding Tools in the AV1 Video Codec , 2018, 2018 Picture Coding Symposium (PCS).

[37]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[39]  Xinfeng Zhang,et al.  CNN-Based Bi-Directional Motion Compensation for High Efficiency Video Coding , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[40]  José Luis Martínez,et al.  Fast partitioning algorithm for HEVC Intra frame coding using machine learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[41]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[42]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[43]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Zhenyu Liu,et al.  CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network , 2016, IEEE Transactions on Image Processing.

[45]  Dong Liu,et al.  One-for-All: Grouped Variation Network-Based Fractional Interpolation in Video Coding , 2019, IEEE Transactions on Image Processing.

[46]  Bin Li,et al.  A convolutional neural network-based approach to rate control in HEVC intra coding , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[47]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[48]  Xinfeng Zhang,et al.  Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding , 2019, IEEE Transactions on Image Processing.

[49]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).