论文信息 - Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network

Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network

The latest Joint Video Exploration Team employs quad-tree plus binary-tree (QTBT) block partitioning structure, which can improve coding performance significantly than High Efficiency Video Coding with hugely increased encoding complexity. To address this issue, we propose a novel fast QTBT partition method through a convolutional neural network (CNN). Specifically, the proposed algorithm uses CNN to predict the QTBT partition depth range of $32\times32$ block directly according to the inherent texture richness of the image, rather than to judge split or not at each depth level. For training optimization, we introduce a misclassification penalty term combined with L2 HingeLoss function, which can further boost the classification accuracy. Experimental results demonstrate the effectiveness of our proposed method; our rate-distortion maintaining setting can achieve 42.33% complexity reduction with just 0.69% bitrate increase. Our low complexity setting can achieve 62.08% complexity reduction with 2.04% bitrate increase.

Chao Yang | Ping An | Liquan Shen | Zhipeng Jin

[1] Jian Zhang,et al. Effective Quadtree Plus Binary Tree Block Partition Decision for Future Video Coding , 2017, 2017 Data Compression Conference (DCC).

[2] NebutaFestival,et al. Fast HEVC Encoding Decisions Using Data Mining , 2022 .

[3] Xiaoou Tang,et al. Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5] Mai Xu,et al. A deep convolutional neural network approach for complexity reduction on intra-mode HEVC , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[6] Zhan Ma,et al. Fast Intra Mode Decision for High Efficiency Video Coding (HEVC) , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[7] Zhan Ma,et al. Fast Mode and Partition Decision Using Machine Learning for Intra-Frame Coding in HEVC Screen Content Coding Extension , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jianjun Lei,et al. Fast Intra Prediction Based on Content Property Analysis for Low Complexity HEVC-Based Screen Content Coding , 2017, IEEE Transactions on Broadcasting.

[10] Xiaokang Yang,et al. Fast coding unit depth decision for HEVC , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[11] Biao Min,et al. A Fast CU Size Decision Algorithm for the HEVC Intra Encoder , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[12] Dahua Lin,et al. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Zhenyu Liu,et al. CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network , 2016, IEEE Transactions on Image Processing.

[14] Ping An,et al. Fast CU size decision and mode decision algorithm for HEVC intra coding , 2013, IEEE Transactions on Consumer Electronics.

[15] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[16] Francisco Charte,et al. Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[17] Ping An,et al. Fast QTBT Partition Algorithm for JVET Intra Coding Based on CNN , 2017, PCM.

[18] Jianle Chen,et al. Position dependent prediction combination for intra-frame video coding , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[19] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Long Xu,et al. Machine Learning-Based Coding Unit Depth Decisions for Flexible Complexity Allocation in High Efficiency Video Coding , 2015, IEEE Transactions on Image Processing.

[22] Yichuan Tang,et al. Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[23] Limin Wang,et al. Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[24] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26] Munchurl Kim,et al. Fast CU Splitting and Pruning for Suboptimal CU Partitioning in HEVC Intra Coding , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[27] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[28] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Zhiyong Gao,et al. Neyman-Pearson-Based Early Mode Decision for HEVC Encoding , 2016, IEEE Transactions on Multimedia.

[30] Gary J. Sullivan,et al. Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[31] Chandra Sekhar Seelamantula,et al. Training-Free, Single-Image Super-Resolution Using a Dynamic Convolutional Network , 2018, IEEE Signal Processing Letters.