DRAM Bandwidth Optimized Design for High-Throughput Video Decoder Chips

i-Abstract The last two decades have witnessed the tremendous advances in video coding technologies. From MPEG-1/2/4 to H.264/AVC, AVS, etc., the continuous innovation in this area has been a significant stimulation of the popularization of multimedia in modern life. For enhancement of coding performance, various new coding tools are adopted in the latest standards. Meanwhile, the use of these new technologies, along with the ever-increasing demand for emerging ultra-high-specification applications such as QFHD (Quad Full High Definition) and SHV (Super Hi-Vision), greatly challenges the design of video decoder chips, with the extensive requirements on both computation power and memory (DRAM) bandwidth. While the advancement of VLSI technology makes it possible to meet the requirement of computation power, the off-chip memory bandwidth is still critically limited by the number and physical design issues of I/O pins. With the current technologies, 4096x2160@60fps requires an average DRAM bandwidth of near 8GB/s, which cannot be handled even by the fastest DDR2 chips with 64-bit parallelism. Besides, DRAM bandwidth efficiency is also a dominant factor in determining the fabrication cost and power consumption of video decoder systems. In this dissertation, various optimization techniques classified into 3 categories are proposed to reduce the DRAM bandwidth of video decoder systems. By combining all the proposed techniques, around 90% saving of DRAM bandwidth can be achieved for high-throughput video decoders, in comparison with the Abstract-ii-state-of-the-art video decoder implementation. Besides, the architecture design of 2 implemented video decoder chips is also presented. This dissertation consists of the following 7 chapters. Chapter 1 [Introduction] introduces the background knowledge of video coding standards, and the DRAM bandwidth issue in video decoder design. Previous works related to this research and an overview of this dissertation, are also presented in this chapter. Chapter 2 [Data Reusing Techniques] presents the techniques for reusing the redundant data transferring between the DRAM and the decoder core. 1) A Pipelined 2-D Cache Architecture is proposed as a basic reference frame sharing component. Under the standard MB scanning order, it is capable of reusing almost all the overlapped reference regions inside MBs and between horizontally neighboring MBs. Therefore 60% DRAM bandwidth for reference frame read can be saved in comparison with the previous VBSMC scheme. 2) A Partial MB Reordering scheme is proposed to achieve an improved MB scanning order, so that most of the overlapped reference regions between vertically adjacent MBs can also be reused, which contributes …

[1]  Oscar C. Au,et al.  Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Dajiang Zhou,et al.  A High Parallelism LDPC Decoder with an Early Stopping Criterion for WiMax and WiFi Application , 2010, IPSJ Trans. Syst. LSI Des. Methodol..

[3]  Satoshi Goto,et al.  A 136 cycles/MB, luma-chroma parallelized H.264/AVC deblocking filter for QFHD applications , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[4]  Y.V. Ivanov,et al.  Reference Frame Compression Using Embedded Reconstruction Patterns for H.264/AVC Decoder , 2008, 2008 The Third International Conference on Digital Telecommunications (icdt 2008).

[5]  Dajiang Zhou,et al.  A Hardware-Efficient Dual-Standard VLSI Architecture for MC Interpolation in AVS and H.264 , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[6]  Satoshi Goto,et al.  A Bandwidth Optimized, 64 Cycles/MB Joint Parameter Decoder Architecture for Ultra High Definition H.264/AVC Applications , 2010, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[7]  Liang-Gee Chen,et al.  Level C+ data reuse scheme for motion estimation with corresponding coding orders , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Xiao Peng,et al.  A high-parallelism reconfigurable permutation network for IEEE 802.11n??802.16e LDPC decoder , 2009, 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[9]  Satoshi Goto,et al.  High Profile Intra Prediction Architecture for UHD H.264 Decoder , 2010, IPSJ Trans. Syst. LSI Des. Methodol..

[10]  Chen-Yi Lee,et al.  A novel embedded bandwidth-aware frame compressor for mobile video applications , 2009, 2008 International Symposium on Intelligent Signal Processing and Communications Systems.

[11]  Satoshi Goto,et al.  A High Performance and Low Bandwidth Multi-Standard Motion Compensation Design for HD Video Decoder , 2010, IEICE Trans. Electron..

[12]  Satoshi Goto,et al.  A new architecture for high performance intra prediction in H.264 decoder , 2009, 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[13]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[14]  Vasily G. Moshnyaga Reducing energy dissipation of frame memory by adaptive bit-width compression , 2002, IEEE Trans. Circuits Syst. Video Technol..

[15]  Yu Li,et al.  Memory Cache Based Motion Compensation Architecture for HDTV H.264/AVC Decoder , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[16]  Dajiang Zhou,et al.  A highly efficient inverse transform architecture for multi-standard HDTV decoder , 2009, 2009 IEEE 8th International Conference on ASIC.

[17]  Jiun-In Guo,et al.  A 160K Gates/4.5 KB SRAM H.264 Video Decoder for HDTV Applications , 2006, IEEE Journal of Solid-State Circuits.

[18]  Madhukar Budagavi,et al.  Video coding using compressed reference frames , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  M.C. Tsai,et al.  A 160kgate 4.5kB SKRAM H.264 video decoder for HDTV applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[20]  Sally A. McKee,et al.  Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[21]  In-Cheol Park,et al.  High-performance and low-power memory-interface architecture for video processing applications , 2001, IEEE Trans. Circuits Syst. Video Technol..

[22]  Chein-Wei Jen,et al.  An efficient quality-aware memory controller for multimedia platform SoC , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Dajiang Zhou,et al.  An SDRAM controller optimized for high definition video coding application , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[24]  Jiun-In Guo,et al.  A 252kgate/71mW Multi-Standard Multi-Channel Video Decoder for High Definition Video Applications , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[25]  Satoshi Goto,et al.  A 1080p@60fps multi-standard video decoder chip designed for power and cost efficiency in a system perspective , 2009, 2009 Symposium on VLSI Circuits.

[26]  Dajiang Zhou,et al.  An SoC based HW/SW co-design architecture for multi-standard audio decoding , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[27]  Liang-Gee Chen,et al.  Bandwidth optimized motion compensation hardware design for H.264/AVC HDTV decoder , 2005, 48th Midwest Symposium on Circuits and Systems, 2005..

[28]  Satoshi Goto,et al.  An Efficient Motion Vector Coding Scheme Based on Prioritized Reference Decision , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[29]  Sergio Bampi,et al.  Memory Hierarchy Targeting Bi-Predictive Motion Compensation for H.264/AVC Decoder , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[30]  Shao-Yi Chien,et al.  Multi-Pass and Frame Parallel Algorithms of Motion Estimation in H.264/AVC for Generic GPU , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[31]  Fei Qiao,et al.  Lossless memory reduction and efficient frame storage architecture for HDTV video decoder , 2008, 2008 International Conference on Audio, Language and Image Processing.

[32]  Hyuk-Jae Lee,et al.  A New Frame Recompression Algorithm Integrated with H.264 Video Compression , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[33]  Iso/iec 14496-2 Information Technology — Coding of Audio-visual Objects — Part 2: Visual , 2022 .

[34]  Satoshi Goto,et al.  A 530 Mpixels/s 4096x2160@60fps H.264/AVC High Profile Video Decoder Chip , 2011, IEEE Journal of Solid-State Circuits.

[35]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[36]  Satoshi Goto,et al.  A 360Mbin/s CABAC decoder for H.264/AVC level 5.1 applications , 2009, 2009 International SoC Design Conference (ISOCC).

[37]  Jinjia Zhou,et al.  A High Speed Deblocking Filter Architecture for H.264/AVC , 2009 .

[38]  Satoshi Goto,et al.  A lossless frame recompression scheme for reducing DRAM power in video encoding , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[39]  Hyuk-Jae Lee,et al.  Cache Organizations for H.264/AVC Motion Compensation , 2007, 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007).

[40]  Chun-Chia Chen,et al.  A 125Mpixels/sec full-HD MPEG-2/H.264/VC-1 video decoder for Blu-ray applications , 2008, 2008 IEEE Asian Solid-State Circuits Conference.

[41]  Kwanghoon Sohn,et al.  VLSI architecture design of motion vector processor for H.264/AVC , 2008, 2008 15th IEEE International Conference on Image Processing.

[42]  Satoshi Goto,et al.  Block-pipelining cache for motion compensation in high definition H.264/AVC video decoder , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[43]  Satoshi Goto,et al.  A 64-cycle-per-MB joint parameter decoder architecture for ultra high definition H.264/AVC applications , 2009, 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[44]  Yu-Cheng Lin,et al.  Multi-pass algorithm of motion estimation in video encoding for generic GPU , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[45]  Satoshi Goto,et al.  An advanced hierarchical motion estimation scheme with lossless frame recompression for ultra high definition video coding , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[46]  Dong Ping Zhang,et al.  An efficient MV prediction VLSI architecture for H.264 video decoder , 2008, 2008 International Conference on Audio, Language and Image Processing.

[47]  Satoshi Goto,et al.  A 530Mpixels/s 4096×2160@60fps H.264/AVC high profile video decoder chip , 2010, 2010 Symposium on VLSI Circuits.

[48]  Xiao Peng,et al.  An early stopping criterion for decoding LDPC codes in WiMAX and WiFi standards , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[49]  Jinjia Zhou,et al.  High profile intra prediction architecture for H.264 , 2009, 2009 International SoC Design Conference (ISOCC).

[50]  Satoshi Goto,et al.  Prioritized reference decision for efficient motion vector coding , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[51]  Tae Young Lee,et al.  A new frame-recompression algorithm and its hardware design for MPEG-2 video decoders , 2003, IEEE Trans. Circuits Syst. Video Technol..

[52]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[53]  Satoshi Goto,et al.  A 48 Cycles/MB H.264/AVC Deblocking Filter Architecture for Ultra High Definition Applications , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[54]  于浩平 Method and apparatus for video decoding , 2006 .

[55]  Satoshi Goto,et al.  An adaptive bandwidth reduction scheme for video coding , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[56]  Youn-Long Lin,et al.  Reference frame access optimization for ultra high resolution H.264/AVC decoding , 2008, 2008 IEEE International Conference on Multimedia and Expo.