Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture

This paper presents a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High Definition (HD) applications on a multiprocessor architecture. We have implemented this parallelization on a cache coherent Non-uniform Memory Access (cc-NUMA) shared memory multiprocessor (SMP) and compared the results with the theoretical expectations. The study includes the evaluation of three different scheduling techniques: static, dynamic and dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance obtaining a maximum speedup of 9.5 with 24 processors. A detailed profiling analysis showed that thread synchronization is one of the limiting factors for achieving a better scalability. The paper includes an evaluation of the impact of using blocking synchronization APIs like POSIX threads and POSIX real-time extensions. Results showed that macroblock-level parallelism as a very fine-grain form of Thread-Level Parallelism (TLP) is highly affected by the thread synchronization overhead generated by these APIs. Other synchronization methods, possibly with hardware support, are required in order to make MB-level parallelization more scalable.

[1]  Milind Girkar,et al.  Towards efficient multi-level threading of H.264 encoder on Intel hyper-threading architectures , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Mateo Valero,et al.  HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[3]  Iain E. G. Richardson,et al.  H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia , 2003 .

[4]  Kurt Keutzer,et al.  Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[5]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[6]  Michael Roitzsch Slice-balancing H.264 video encoding for improved scalability of multicore decoding , 2007, EMSOFT '07.

[7]  J. Bennett,et al.  Advanced video coding , 2003 .

[8]  Klaus Schöffmann,et al.  An Evaluation of Parallelization Concepts for Baseline-Profile Compliant H.264/AVC Decoders , 2007, Euro-Par.

[9]  Amit Gulati,et al.  Efficient mapping of the H.264 encoding algorithm onto multiprocessor DSPs , 2005, IS&T/SPIE Electronic Imaging.

[10]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[11]  Vassilios A. Chouliaras,et al.  Thread-parallel MPEG-2, MPEG-4 and H.264 video encoders for SoC multi-processor architectures , 2006, IEEE Transactions on Consumer Electronics.

[12]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[13]  Jose Aguilar,et al.  Revista Avances en Sistemas e Informática , 2011 .

[14]  Coniferous softwood GENERAL TERMS , 2003 .

[15]  Zhuo Zhao,et al.  Data partition for wavefront parallelization of H.264 video encoder , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[16]  Ben H. H. Juurlink,et al.  Parallel H.264 Decoding on an Embedded Multicore Processor , 2009, HiPEAC.

[17]  D. Marpe,et al.  Video coding with H.264/AVC: tools, performance, and complexity , 2004, IEEE Circuits and Systems Magazine.

[18]  Ben H. H. Juurlink,et al.  Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..

[19]  Andrei Sergeevich Terechko,et al.  A Multithreaded Multicore System for Embedded Media Processing , 2011, Trans. High Perform. Embed. Archit. Compil..

[20]  Iain E.G,et al.  H.264 and MPEG 4 video , 2009 .

[21]  Gary J. Sullivan,et al.  Video Compression - From Concepts to the H.264/AVC Standard , 2005, Proceedings of the IEEE.

[22]  Thomas Sikora,et al.  Trends and Perspectives in Image and Video Coding , 2005, Proceedings of the IEEE.

[23]  Faouzi Kossentini,et al.  H.264/AVC baseline profile decoder complexity analysis , 2003, IEEE Trans. Circuits Syst. Video Technol..

[24]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[25]  Yen-Kuang Chen,et al.  Implementation of H.264 encoder and decoder on personal computers , 2006, J. Vis. Commun. Image Represent..

[26]  Manuel P. Malumbres,et al.  Hierarchical Parallelization of an H.264/AVC Video Encoder , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).

[27]  Thomas Rauber,et al.  A comparison of task pools for dynamic load balancing of irregular algorithms , 2004, Concurr. Comput. Pract. Exp..