Memory-Hierarchical and Mode-Adaptive HEVC Intra Prediction Architecture for Quad Full HD Video Decoding

This paper presents a high-throughput and areaefficient VLSI architecture for intra prediction in the emerging high efficiency video coding standard. Three design techniques are proposed to address the complexity systematically: 1) a hierarchical memory deployment that stores neighboring samples in 4.9 Kb of static RAM (SRAM) instead of 43.2-k gates of registers and increases throughput by processing reference samples in registers; 2) a mode-adaptive scheduling scheme for all prediction units, which provides at least 2 samples/cycle throughput while using low-throughput SRAM and can achieve 2.46 samples/cycle on the average based on the experimental results; and 3) resource sharing for multipliers and the readout circuits of reference sample registers, which can save 2.5-k gates. These techniques can efficiently reduce area by 40% but induce more power because of additional signal transitions. Signal-gating circuits are then applied to reduce 69% of SRAM power and 32% of logic power, which cost only 1.0-k gates. When synthesized at 200 MHz with 40-nm process, the proposed architecture needs only 27.0-k gates and 4.9 Kb of single-port SRAM. The layout core area is 0.036 mm2, and the power consumption is 2.11 mW in the postlayout simulation. The corresponding performance can support quad full high-definition (HD) (3840 × 2160) video decoding at 30 frames/s.

[1]  Liang-Gee Chen,et al.  Architecture design of H.264/AVC decoder with hybrid task pipelining for high definition videos , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[2]  Jiun-In Guo,et al.  Low Complexity Architecture Design of H.264 Predictive Pixel Compensator for HDTV Application , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Oliver Chiu-sing Choy,et al.  A Power-Efficient and Self-Adaptive Prediction Engine for H.264/AVC Decoding , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Jinjia Zhou,et al.  High profile intra prediction architecture for H.264 , 2009, 2009 International SoC Design Conference (ISOCC).

[5]  Antti Hallapuro,et al.  High Performance, Low Complexity Video Coding and the Emerging HEVC Standard , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Kemal Ugur,et al.  Angular intra prediction in High Efficiency Video Coding (HEVC) , 2011, 2011 IEEE 13th International Workshop on Multimedia Signal Processing.

[7]  Guangming Shi,et al.  An efficient VLSI architecture for 4×4 intra prediction in the High Efficiency Video Coding (HEVC) standard , 2011, 2011 18th IEEE International Conference on Image Processing.

[8]  Gwo-Long Li,et al.  A 135 MHz 542 k Gates High Throughput H.264/AVC Scalable High Profile Decoder , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Ilker Hamzaoglu,et al.  A high performance and low energy intra prediction hardware for HEVC video decoding , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[10]  Kemal Ugur,et al.  Intra Coding of the HEVC Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Antti Hallapuro,et al.  Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Gary J. Sullivan,et al.  Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC) , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Moncef Gabbouj,et al.  Complexity analysis of next-generation HEVC decoder , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[14]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.