High Throughput Parallel-Pipeline 2-D DCT/IDCT Processor Chip

This paper presents a 2-D DCT/IDCT processor chip for high data rate image processing and video coding. It uses a fully pipelined row–column decomposition method based on two 1-D DCT processors and a transpose buffer based on D-type flip-flops with a double serial input/output data-flow. The proposed architecture allows the main processing elements and arithmetic units to operate in parallel at half the frequency of the data input rate. The main characteristics are: high throughput, parallel processing, reduced internal storage, and maximum efficiency in computational elements. The processor has been implemented using standard cell design methodology in 0.35 μm CMOS technology. It measures 6.25 mm2 (the core is 3 mm2) and contains a total of 11.7 k gates. The maximum frequency is 300 MHz with a latency of 172 cycles for 2-D DCT and 178 cycles for 2-D IDCT. The computing time of a block is close to 580 ns. It has been designed to meets the demands of IEEE Std. 1,180–1,990 used in different video codecs. The good performance in the computing speed and hardware cost indicate that this processor is suitable for HDTV applications.

[1]  K. R. Rao,et al.  Techniques and Standards for Image, Video, and Audio Coding , 1996 .

[2]  Zhigang Cao,et al.  New cost-effective VLSI implementation of a 2-D discrete cosine transform and its inverse , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Chao-Ho Chen,et al.  A cost-effective 8×8 2-D IDCT core processor with folded architecture , 1999, IEEE Trans. Consumer Electron..

[4]  T. Fujita,et al.  A 0.9 V 150 MHz 10 mW 4 mm/sup 2/ 2-D discrete cosine transform core processor with variable-threshold-voltage scheme , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[5]  Itu-T Video coding for low bitrate communication , 1996 .

[6]  G. M. Blair,et al.  Design for the discrete cosine transform in VLSI , 1998 .

[7]  Peter A. Ruetz,et al.  A high-performance full-motion video compression chip set , 1992, IEEE Trans. Circuits Syst. Video Technol..

[8]  William C. Miller,et al.  On Computing the Discrete Cosine Transform , 1978, IEEE Transactions on Computers.

[9]  Chein-Wei Jen,et al.  A simple processor core design for DCT/IDCT , 2000, IEEE Trans. Circuits Syst. Video Technol..

[10]  Wen-Hsiung Chen,et al.  A Fast Computational Algorithm for the Discrete Cosine Transform , 1977, IEEE Trans. Commun..

[11]  M.-J. Hsiao,et al.  Carry-select adder using single ripple-carry adder , 1998 .

[12]  Henrique S. Malvar Fast computation of discrete cosine transform through fast Hartley transform , 1986 .

[13]  Liang-Gee Chen,et al.  A cost-effective architecture for 8×8 two-dimensional DCT/IDCT using direct method , 1997, IEEE Trans. Circuits Syst. Video Technol..

[14]  Seehyun Kim,et al.  Optimum wordlength determination of 8/spl times/8 IDCT architectures conforming to the IEEE standard specifications , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[15]  Alan N. Willson,et al.  A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications , 1995, IEEE Trans. Circuits Syst. Video Technol..

[16]  Jiun-In Guo,et al.  An Efficient IDCT Processor Design for HDTV Applications , 2003, J. VLSI Signal Process..

[17]  Earl E. Swartzlander,et al.  A scaled DCT architecture with the CORDIC algorithm , 2002, IEEE Trans. Signal Process..

[18]  J. Anderson,et al.  Architecture and Construction of a Hardware Sequential Encoder for Speech , 1977, IEEE Trans. Commun..

[19]  Luca Fanucci,et al.  A low-complexity 2D discrete cosine transform processor for multimedia applications , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[20]  Yu-Tai Chang,et al.  A new fast DCT algorithm and its systolic VLSI implementation , 1997 .

[21]  Konstantinos Konstantinides,et al.  Image and video compression standards , 1995 .

[22]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[23]  Jun Rim Choi,et al.  A compatible DCT/IDCT architecture using hardwired distributed arithmetic , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[24]  Konstantinos Konstantinides,et al.  Image and Video Compression Standards: Algorithms and Architectures , 1997 .

[25]  G. A. Ruiz,et al.  Parallel-pipeline 2-D DCT/IDCT processor chip , 2005, SPIE Microtechnologies.

[26]  Han-Jin Cho,et al.  A design of 2-D DCT/IDCT for real-time video applications , 1999, ICVC '99. 6th International Conference on VLSI and CAD (Cat. No.99EX361).

[27]  Ieee Standards Board,et al.  IEEE standard specifications for the implementations of 8x8 inverse discrete cosine transform , 1991 .

[28]  Luca Fanucci,et al.  Data driven VLSI computation for low power DCT-based video coding , 2002, 9th International Conference on Electronics, Circuits and Systems.

[29]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[30]  Masahiko Yoshimoto,et al.  A 100-MHz 2-D discrete cosine transform core processor , 1992 .