Hybrid partitioned H.264 full high definition decoder on embedded quad-core

In this paper, the problem of efficient mapping of the H.264 decoder on an embedded quad-core platform is addressed. For this purpose, a new partitioning method called `hybrid partitioning' is proposed. Partitioning is a very important issue for the mapping of application software on multi-core systems. For H.264 video decoders, functional partitioning and data partitioning were proposed, and usually used. Hybrid partitioning is the mixture of two partitioning methods, and each module is partitioned by functional partitioning or data partitioning, depending on the module's features. Compared with dedicated functional or data partitioning, hybrid partitioning is as powerful as data partitioning for load balancing between cores, and is also as efficient as functional partitioning from the viewpoint of memory requirement. Hybrid partitioning is also free from the macroblock level dependency problem that data partitioning usually has in video decoding. As a result of applying hybrid partitioning, 86.0% of waiting overhead is reduced, compared with functional partitioning. Regarding memory usage, hybrid partitioning requires 51.2% less VLIW (Very Long Instruction Word) program memory, and 62.0% less CGRA (Coarse-Grained Reconfigurable Array) program memory, than data partitioning. As for SDRAM (Synchronous Dynamic Random-Access Memory) bandwidth, compared with data partitioning, hybrid partitioning conserves the SDRAM bandwidth of 38.6MHz. This is 11.6% of the whole bandwidth budget of 333MHz SDRAM memory used in experiments. A parallelized decoder with hybrid partitioning on an embedded quad-core system is 3.5 times faster than that on a single core.

[1]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[2]  Ki-Seok Chung,et al.  Stage-based frame-partitioned parallelization of H.264/AVC decoding , 2010, IEEE Transactions on Consumer Electronics.

[3]  Rudy Lauwereins,et al.  Architecture exploration for a reconfigurable architecture template , 2005, IEEE Design & Test of Computers.

[4]  Kue-Hwan Sihn,et al.  Analysis and Parallelization of H.264 decoder on Cell Broadband Engine Architecture , 2007, 2007 IEEE International Symposium on Signal Processing and Information Technology.

[5]  Maria G. Koziri,et al.  Implementation of the AVS video decoder on a heterogeneous dual-core SIMD processor , 2010, IEEE Transactions on Consumer Electronics.

[6]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[7]  Jong-Tae Kim,et al.  H.264/AVC decoder parallelization and optimization on asymetric multicore platform using dynamic load balancing , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[8]  Do-Hyung Kim,et al.  Memory processing unit in video decoding system , 2012, 2012 IEEE International Conference on Consumer Electronics (ICCE).

[9]  Tatsuji Moriyoshi,et al.  Parallelization of H.264 video decoder for embedded multicore processor , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[10]  Yong Ho Song,et al.  Efficient coordination of parallel threads of H.264/AVC decoder for performance improvement , 2010, IEEE Transactions on Consumer Electronics.

[11]  Kurt Keutzer,et al.  Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[12]  Do-Hyung Kim,et al.  H.264 decoder on embedded dual core with dynamically load-balanced functional paritioning , 2010, 2010 IEEE International Conference on Image Processing.