Performance evaluation of Kvazaar HEVC intra encoder on Xeon Phi many-core processor

This paper analyzes parallel scalability and coding speed of our open-source Kvazaar HEVC intra encoder on Intel Xeon Phi 61-core coprocessor that supports up to four hardware threads per core. The evaluated parallelization schemes of Kvazaar are 1) Wavefront Parallel Processing (WPP); and 2) tiles, both accelerated with picture-level parallel processing. With WPP, the C implementation of Kvazaar high-quality preset achieves an average speedup of 1.3 and a bit rate gain of 0.7% over the respective implementation of x265. Using tiles makes Kvazaar 1.4 times faster than x265 but at a cost of 0.3% bit rate loss. When high-speed presets are used, the speedup of Kvazaar increases to 1.4 with WPP and to 1.9 with tiles. Moreover, the respective coding efficiency of Kvazaar rises to 11.2% and 10.3%. Kvazaar also scales almost linearly to the number of cores in the processor. Even if the peak coding speed of Kvazaar on Xeon Phi is lower than that on the Intel 8-core i7 processor, our parallel scalability results promise excellent speed for Kvazaar on massively parallel processors equipped with more powerful cores.

[1]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Antti Hallapuro,et al.  Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[4]  Timo Hämäläinen,et al.  Parallelization of Kvazaar HEVC intra encoder for multi-core processors , 2015, 2015 IEEE Workshop on Signal Processing Systems (SiPS).

[5]  Kemal Ugur,et al.  Intra Coding of the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[7]  Bruno Raffin,et al.  Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor , 2013, 2013 25th International Symposium on Computer Architecture and High Performance Computing.

[8]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[9]  Ben H. H. Juurlink,et al.  Parallel Scalability and Efficiency of HEVC Parallelization Approaches , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Minhua Zhou,et al.  An Overview of Tiles in HEVC , 2013, IEEE Journal of Selected Topics in Signal Processing.

[11]  K. R. Rao,et al.  High efficiency video coding , 2016, 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[12]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[13]  Timo Hämäläinen,et al.  Kvazaar HEVC encoder for efficient intra coding , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[14]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .