Scalable row-based parallel H.264 decoder on embedded multicore processors

Multimedia applications are present in most mobile hand-held devices, which are still equipped with limited battery resources. The H.264 standard is currently dominating the video compression world. H.264 has high computational requirements in terms of memory, energy, and time. Many techniques emerged that optimize parallel task granularity on multicore systems ranging from groups of pictures until the smallest block of pixels. A scalable parallel technique for the motion compensation phase is proposed in this research that is based on processing of groups of macroblock rows. Moreover, a light dependency detection algorithm is added to the prediction phase that enables parallel execution and minimizes synchronization stall time. Furthermore, a parallel implementation of the deblocking filter is also implemented. The overall result is an efficient and highly scalable parallel H.264 decoder that is evaluated on a real-board platform composed of an ARM Cortex-A9 MPCore with four processors. Various low- and high-definition video sequences are used in experiments. Results show that execution time reaches a speedup of 3.3$$\times $$× for motion compensation stage and an overall speedup of 2.3$$\times $$× on 4 cores including communication and synchronization overhead. Energy consumption decreases up to 63 % for the whole application execution.

[1]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[2]  Jong-Tae Kim,et al.  Novel approaches to parallel H.264 decoder on symmetric multicore systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Maja Bystrom,et al.  Complexity reduction of H.264 using Lagrange optimization methods , 2005 .

[4]  Zhuo Zhao,et al.  Data partition for wavefront parallelization of H.264 video encoder , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[5]  Nishihara Kosuke,et al.  Parallelization of H.264 Video Decoder for Embedded Multicore Processor , 2007 .

[6]  Mateo Valero,et al.  Scalability of Macroblock-level Parallelism for H.264 Decoding , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[7]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Chia-Lin Yang,et al.  A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters , 2009, J. Signal Process. Syst..

[9]  Florian H. Seitner,et al.  Evaluation of data-parallel H.264 decoding approaches for strongly resource-restricted architectures , 2010, Multimedia Tools and Applications.

[10]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[11]  Ben H. H. Juurlink,et al.  Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..

[12]  Kurt Keutzer,et al.  Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[13]  P. N. Tudor MPEG-2 video compression , 1995 .

[14]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Charlie Chung-Ping Chen,et al.  GOP-level parallelization of the H.264 decoder without a start-code scanner , 2010, 2010 2nd International Conference on Signal Processing Systems.

[16]  Wesley De Neve,et al.  Parallel Deblocking Filtering in MPEG-4 AVC/H.264 on Massively Parallel Architectures , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Faouzi Kossentini,et al.  H.264/AVC baseline profile decoder complexity analysis , 2003, IEEE Trans. Circuits Syst. Video Technol..

[18]  Paul Farris,et al.  The Apple Iphone , 2009, SSRN Electronic Journal.