Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network

The latest Joint Video Exploration Team employs quad-tree plus binary-tree (QTBT) block partitioning structure, which can improve coding performance significantly than High Efficiency Video Coding with hugely increased encoding complexity. To address this issue, we propose a novel fast QTBT partition method through a convolutional neural network (CNN). Specifically, the proposed algorithm uses CNN to predict the QTBT partition depth range of $32\times32$ block directly according to the inherent texture richness of the image, rather than to judge split or not at each depth level. For training optimization, we introduce a misclassification penalty term combined with L2 HingeLoss function, which can further boost the classification accuracy. Experimental results demonstrate the effectiveness of our proposed method; our rate-distortion maintaining setting can achieve 42.33% complexity reduction with just 0.69% bitrate increase. Our low complexity setting can achieve 62.08% complexity reduction with 2.04% bitrate increase.

[1]  Jian Zhang,et al.  Effective Quadtree Plus Binary Tree Block Partition Decision for Future Video Coding , 2017, 2017 Data Compression Conference (DCC).

[2]  NebutaFestival,et al.  Fast HEVC Encoding Decisions Using Data Mining , 2022 .

[3]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5]  Mai Xu,et al.  A deep convolutional neural network approach for complexity reduction on intra-mode HEVC , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[6]  Zhan Ma,et al.  Fast Intra Mode Decision for High Efficiency Video Coding (HEVC) , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Zhan Ma,et al.  Fast Mode and Partition Decision Using Machine Learning for Intra-Frame Coding in HEVC Screen Content Coding Extension , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jianjun Lei,et al.  Fast Intra Prediction Based on Content Property Analysis for Low Complexity HEVC-Based Screen Content Coding , 2017, IEEE Transactions on Broadcasting.

[10]  Xiaokang Yang,et al.  Fast coding unit depth decision for HEVC , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[11]  Biao Min,et al.  A Fast CU Size Decision Algorithm for the HEVC Intra Encoder , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Dahua Lin,et al.  PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zhenyu Liu,et al.  CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network , 2016, IEEE Transactions on Image Processing.

[14]  Ping An,et al.  Fast CU size decision and mode decision algorithm for HEVC intra coding , 2013, IEEE Transactions on Consumer Electronics.

[15]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[16]  Francisco Charte,et al.  Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[17]  Ping An,et al.  Fast QTBT Partition Algorithm for JVET Intra Coding Based on CNN , 2017, PCM.

[18]  Jianle Chen,et al.  Position dependent prediction combination for intra-frame video coding , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Long Xu,et al.  Machine Learning-Based Coding Unit Depth Decisions for Flexible Complexity Allocation in High Efficiency Video Coding , 2015, IEEE Transactions on Image Processing.

[22]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[23]  Limin Wang,et al.  Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Munchurl Kim,et al.  Fast CU Splitting and Pruning for Suboptimal CU Partitioning in HEVC Intra Coding , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[28]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhiyong Gao,et al.  Neyman-Pearson-Based Early Mode Decision for HEVC Encoding , 2016, IEEE Transactions on Multimedia.

[30]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Chandra Sekhar Seelamantula,et al.  Training-Free, Single-Image Super-Resolution Using a Dynamic Convolutional Network , 2018, IEEE Signal Processing Letters.