A Fast Algorithm-Based Cost-Effective and Hardware-Efficient Unified Architecture Design of 4 × 4, 8 × 8, 16 × 16, and 32 × 32 Inverse Core Transforms for HEVC

In this study, a novel fast algorithm based hardware-sharing architecture for 4 × 4, 8 × 8, 16 × 16, and 32 × 32 inverse core transforms in high-efficiency video coding (HEVC) with a cost effective and highly hardware efficient design is developed. By using the symmetrical characteristics of the elements in inverse core transform matrices, the core transform matrix with symmetrical characteristics is factorized into several submatrices. Based on the symmetry and similarity between the submatrices, the hardware of the (N/2) × (N/2) inverse core transform is shared with that of the N × N inverse core transform for N = 32, 16, and 8. Compared with each transform design without hardware shares, the proposed multiplierless transform architecture reduces the hardware overheads of adders and shifters by 32 and 36 %, respectively. The hardware efficiency of the proposed architecture is up to 166 % higher than that of several previous transform designs for HEVC, and up to 141 % higher than that of field-programmable gate array (FPGA)-based 16-point transform designs. Because it uses 90-nm complimentary metal-oxide semiconductor (CMOS) technology produced by the Taiwan Semiconductor Manufacturing Company (TSMC), the proposed 1-D hardware sharing scheme requires 115.7 K gate counts to achieve an operational frequency of up to 200 MHz, and it can decode 4 × 2 K (4096 × 2048 pixels) and 8 K UHDTV (7680 × 4320 pixels) video in real time at up to 127 and 32 frames per second, respectively.

[1]  Gamal Fahmy,et al.  Efficient fast multiplication-free integer transformation for the 2-D DCT H.265 standard , 2010, 2010 IEEE International Conference on Image Processing.

[2]  Thomas Wiegand,et al.  Draft ITU-T recommendation and final draft international standard of joint video specification , 2003 .

[3]  Khan A. Wahid,et al.  Implementation of a cost-shared transform architecture for multiple video codecs , 2012, Journal of Real-Time Image Processing.

[4]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[5]  David Renshaw,et al.  IEEE International Symposium on Circuits and Systems (ISCAS) , 1990 .

[6]  M. Grellert,et al.  Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard , 2012, 2012 VIII Southern Conference on Programmable Logic.

[7]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  David R. Bull,et al.  Projective image restoration using sparsity regularization , 2013, 2013 IEEE International Conference on Image Processing.

[9]  Takao Onoye,et al.  High-performance multiplierless transform architecture for HEVC , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[10]  Colin Doutre,et al.  HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC? , 2012, IEEE Consumer Electronics Magazine.

[11]  David Flynn,et al.  HEVC Complexity and Implementation Analysis , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Siyuan Fang,et al.  Multi-perspective Panoramas of Long Scenes , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[13]  Weiwei Shen,et al.  A Unified 4/8/16/32-Point Integer IDCT Architecture for Multiple Video Coding Standards , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[14]  King Ngi Ngan,et al.  2-D Order-16 Integer Transforms for HD Video Coding , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Muhammad Usman Shahid,et al.  Point DCT VLSI Architecture for Emerging HEVC Standard , 2012, VLSI Design.