Towards efficient multi-level threading of H.264 encoder on Intel hyper-threading architectures

Summary form only given. Exploiting thread-level parallelism is a promising way to improve the performance of multimedia applications that are running on multithreading general-purpose processors. We describe the work in developing our threaded H.264 encoder. We parallelize the H.264 encoder using the OpenMP programming model, which allows us to leverage the advanced compiler technologies in the Intel/spl reg/ C++ compiler for Intel hyper-threading architectures. After we present our design considerations in the parallelization process, we describe two efficient methods for multilevel data partitioning, which can improve the performance of our multithreaded H.264 encoder. Furthermore, we exploit different options in the OpenMP programming. While one implementation that uses the task queuing model is slightly slower than the other implementation, it is easier to be read than the other one. The results have shown good speedups ranging from 3.74x to 4.53x over the well-optimized sequential code performance on a system of 4 Intel Xeon/spl trade/processors with hyper-threading technology.

[1]  Milind Girkar,et al.  Exploring the use of Hyper-Threading technology for multimedia applications with Intel/spl reg/ OpenMP compiler , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[2]  Xin-Min Tian,et al.  Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance , 2002 .

[3]  David A. Koufaty,et al.  Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.

[4]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[5]  H. H. Taylor,et al.  A MPEG encoder implementation on the Princeton Engine video supercomputer , 1993, [Proceedings] DCC `93: Data Compression Conference.

[6]  Raymond Lo,et al.  A new algorithm for partial redundancy elimination based on SSA form , 1997, PLDI '97.

[7]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[8]  Yen-Kuang Chen,et al.  Implementation of H.264 decoder on general-purpose processors with media instructions , 2003, IS&T/SPIE Electronic Imaging.

[9]  Denilson Barbosa,et al.  Real-Time MPEG Encoding in Shared-Memory Multiprocessors ⁄ , 1999 .

[10]  Vivek Sarkar,et al.  Automatic parallelization for symmetric shared-memory multiprocessors , 1996, CASCON.

[11]  Edward J. Delp,et al.  Parallel implementation of an MPEG-1 encoder: faster than real time , 1995, Electronic Imaging.

[12]  Yen-Kuang Chen,et al.  Video applications on hyper-threading technology , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[13]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[14]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[15]  Henrique S. Malvar,et al.  Low-complexity transform and quantization with 16-bit arithmetic for H.26L , 2002, Proceedings. International Conference on Image Processing.

[16]  Matthew J. Holliman,et al.  Media Applications on Hyper-Threading Technology , 2002 .

[17]  Xinmin Tian,et al.  Efficient multithreading implementation of H.264 encoder on Intel hyper-threading architectures , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.