AVX2-optimized Kvazaar HEVC intra encoder

This paper presents efficient SIMD optimizations for the open-source Kvazaar HEVC intra encoder. The C implementation of Kvazaar is accelerated by Intel AVX2 instructions whose effect on Kvazaar ultrafast preset is profiled. According to our profiling results, C functions of SATD, DCT, quantization, and intra prediction account for over 60% of the total intra coding time of Kvazaar ultrafast preset. This work shows that optimizing primarily these functions doubles the coding speed of a single-threaded Kvazaar intra encoder for the same rate-distortion performance. The highest performance boost is obtained by deploying the proposed optimizations jointly with multithreading. On the Intel 8-core i7 processor, the AVX2-optimized 16-threaded Kvazaar ultrafast preset achieves real-time (30 fps) intra coding speed up to 1080p resolution. Compared to AVX2-optimized ultrafast preset of x265, Kvazaar is 20% times faster and still obtains 9.1% bit rate gain for the same quality. These results justify that Kvazaar is currently the leading open-source HEVC intra encoder in terms of real-time coding speed and efficiency.

[1]  Timo Hämäläinen,et al.  Parallelization of Kvazaar HEVC intra encoder for multi-core processors , 2015, 2015 IEEE Workshop on Signal Processing Systems (SiPS).

[2]  Yong-Jo Ahn,et al.  Implementation of fast HEVC encoder based on SIMD and data-level parallelism , 2014, EURASIP J. Image Video Process..

[3]  Kemal Ugur,et al.  Intra Coding of the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  K. R. Rao,et al.  High efficiency video coding , 2016, 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[5]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[6]  Antti Hallapuro,et al.  Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Jun Sun,et al.  Implementation of HEVC decoder on x86 processors with SIMD optimization , 2012, 2012 Visual Communications and Image Processing.

[8]  Jun Sun,et al.  Efficient SIMD optimization of HEVC encoder over X86 processors , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[9]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[10]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[12]  Ben H. H. Juurlink,et al.  Parallel Scalability and Efficiency of HEVC Parallelization Approaches , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Timo Hämäläinen,et al.  Kvazaar HEVC encoder for efficient intra coding , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).