Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System

This paper proposes approaches to perform HW/SW (Hardware/Software) partition and parallelization of computing-intensive tasks of the H.264 HiP (High Profile) decoding algorithm on an embedded coarse-grained reconfigurable multimedia system, called REMUS (REconfigurable MUltimedia System). Several techniques, such as MB (MacroBlock) based parallelization, unfixed sub-block operation etc., are utilized to speed up the decoding process, satisfying the requirements of real-time and high quality H.264 applications. Tests show that the execution performance of MC (Motion Compensation), deblocking, and IDCT-IQ (Inverse Discrete Cosine Transform–Inverse Quantization) on REMUS is improved by 60%, 73%, 88.5% in the typical case and 60%, 69%, 88.5% in the worst case, respectively compared with that on XPP PACT (a commercial reconfigurable processor). Compared with ASIC solutions, the performance of MC is improved by 70%, 74% in the typical and in the worst case, respectively, while those of Deblocking remain the same. As for IDCT IQ, the performance is improved by 17% no matter in the typical or worst case. Relying on the proposed techniques, 1080p@30 fps of H.264 HiP@ Level 4 decoding could be achieved on REMUS when utilizing a 200 MHz working frequency. key words: H.264, reconfigurable multimedia system, parallelization computation, hardware/software partition

[2]  Myung Hoon Sunwoo,et al.  Novel instructions and their hardware architecture for video signal processing , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[3]  Joint Video Team Draft ITU-T Recommendation and Final draft international standard of joint video specification , 2003 .

[4]  T. Sato,et al.  Implementation of dynamically reconfigurable processor DAPDNA-2 , 2005, 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test, 2005. (VLSI-TSA-DAT)..

[5]  Satoshi Goto,et al.  A 1080p@60fps multi-standard video decoder chip designed for power and cost efficiency in a system perspective , 2009, 2009 Symposium on VLSI Circuits.

[6]  Gary J. Sullivan,et al.  Performance comparison of video coding standards using Lagrangian coder control , 2002, Proceedings. International Conference on Image Processing.

[7]  Ashraf A. Kassim,et al.  A pipelined hardware implementation of in-loop deblocking filter in H.264/AVC , 2006, IEEE Transactions on Consumer Electronics.

[8]  Jürgen Becker,et al.  H. 264 Decoder at HD Resolution on a Coarse Grain Dynamically Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[9]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[10]  Der Chuang ALGORITHM AND ARCHITECTURE DESIGN FOR INTRA PREDICTION IN H . 264 / AVC HIGH PROFILE , 2007 .

[11]  Chen-Yi Lee,et al.  A new motion compensation design for H.264/AVC decoder , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[12]  Frank Vahid,et al.  Warp Processing: Dynamic Translation of Binaries to FPGA Circuits , 2008, Computer.

[13]  Jani Lainema,et al.  Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..

[14]  Wen-Hsiao Peng,et al.  A platform-based MPEG-4 advanced video coding (AVC) decoder with block level pipelining , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[15]  Yen-Kuang Chen,et al.  Implementation of H.264 decoder on general-purpose processors with media instructions , 2003, IS&T/SPIE Electronic Imaging.

[16]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[17]  Carl Ebeling,et al.  Mapping applications to the RaPiD configurable architecture , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[18]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).