A Five-Stage Pipeline, 204 Cycles/MB, Single-Port SRAM-Based Deblocking Filter for H.264/AVC

This paper describes the design and VLSI implementation of a highly efficient, single-port SRAM-based deblocking filter. It can achieve 204 cycles/macroblock throughput for H.264/AVC real-time decoding. Several deblocking filter designs in the literature have been compared and the possibility of realizing them in a pipeline is studied. Eventually we came up with a completely new design which has a five-stage pipeline with gated clock to increase system throughput while reducing power. Data hazards and structure hazards, which are the two most critical issues for a pipelined filter, are analyzed and resolved. Efficient memory organization for both on-chip SRAM and transposition buffers is employed. By using innovative hybrid edge filtering sequence and out-of-order memory update scenario, we obtain zero stall cycle in normal pipeline flow, making the best out of a pipelined architecture. Compared with existing designs, our design achieves at least 18% clock cycle reduction, as well as 20% lower power consumption owing to its efficient pipeline and memory architecture. The total gate count is comparable to other designs in literature without using any expensive two-port or dual-port on-chip SRAMs.

[1]  Ilker Hamzaoglu,et al.  An Efficient Hardware Architecture for H.264 Adaptive Deblocking Filter , 2006, First NASA/ESA Conference on Adaptive Hardware and Systems (AHS'06).

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Jani Lainema,et al.  Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..

[4]  Wen Gao,et al.  An implemented architecture of deblocking filter for H.264/AVC , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[5]  Jong-Wha Chong,et al.  A Memory and Performance Optimized Architecture of Deblocking Filter in H.264/AVC , 2007, 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).

[6]  Chen-Yi Lee,et al.  A low-power H.264/AVC decoder , 2005, 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test, 2005. (VLSI-TSA-DAT)..

[7]  Chen-Yi Lee,et al.  A memory-efficient deblocking filter for H.264/AVC video coding , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[8]  Chen-Yi Lee,et al.  An In/Post-Loop Deblocking Filter With Hybrid Filtering Schedule , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Mircea R. Stan,et al.  Challenges in clockgating for a low power ASIC methodology , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[10]  Gary J. Sullivan,et al.  Rate-constrained coder control and comparison of video coding standards , 2003, IEEE Trans. Circuits Syst. Video Technol..

[11]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[12]  Liang-Gee Chen,et al.  Architecture design for deblocking filter in H.264/JVT/AVC , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[13]  Tian-Sheuan Chang,et al.  An in-place architecture for the deblocking filter in H.264/AVC , 2006, IEEE Trans. Circuits Syst. II Express Briefs.

[14]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[15]  Youn-Long Lin,et al.  An AMBA-compliant deblocking filter IP for H.264/AVC , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[16]  Luca Benini,et al.  Clock-tree power optimization based on RTL clock-gating , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).