Parallel deblocking filter for H.264/AVC implemented on Tile64 platform

For the purpose of accelerating deblocking filter, which accounts for a significant percentage of H.264/AVC decoding time, some researchers use multi-core platforms to achieve the required performance. We study the problem under the context of many-core systems. Parallelization of deblocking filter on many-core platform is challenging not only because deblocking filter has complicated data dependencies which provides insufficient parallelism for so many cores but also because parallelization may have significant synchronization overhead. We present a new method to exploit the implicit parallelism and reduce the synchronization overhead. We apply our implementation to the deblocking filter of the H.264/AVC reference software JM15.1 on Tile64 platform. The proposed method achieves up to 817%, 604% and 532% speedup for CIF, SD and HD videos compared to the well-known wavefront method using 62 cores, respectively.

[1]  Kyu Ho Park,et al.  Variable block-based deblocking filter for H.264/AVC on low-end and low-bit rates terminals , 2010, Signal Process. Image Commun..

[2]  Jani Lainema,et al.  Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..

[3]  Yongdong Zhang,et al.  Parallel spatial matching for object retrieval implemented on GPU , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[4]  V. Strumpen,et al.  A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[5]  Ruimin Hu,et al.  An effective method of deblocking filter for H.264/AVC , 2007, 2007 International Symposium on Communications and Information Technologies.

[6]  Kuan-Hung Chen 48 cycles-per-macro block deblocking filter accelerator for high-resolution H.264/AVC decoding , 2010, IET Circuits Devices Syst..

[7]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[8]  Ben H. H. Juurlink,et al.  Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..

[9]  Gaurav Mittal,et al.  Design of the Power6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Joint Video Team Draft ITU-T Recommendation and Final draft international standard of joint video specification , 2003 .

[12]  Ashok Kumar,et al.  An 8-Core 64-Thread 64b Power-Efficient SPARC SoC , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[13]  Liang-Gee Chen,et al.  Algorithm analysis and architecture design for HDTV applications - a look at the H.264/AVC video compressor system , 2006, IEEE Circuits and Devices Magazine.

[14]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[15]  Chia-Lin Yang,et al.  A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters , 2009, J. Signal Process. Syst..

[16]  R. Kumar,et al.  An Integrated Quad-Core Opteron Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[17]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.