Parallel-pipeline 2-D DCT/IDCT processor chip

This paper describes the architecture of an 8x8 2-D DCT/IDCT processor with high throughput and a cost-effective architecture. The 2D DCT/IDCT is calculated using the separability property, so that its architecture is made up of two 1-D processors and a transpose buffer (TB) as intermediate memory. This transpose buffer presents a regular structure based on D-type flip-flops with a double serial input/output data-flow very adequate for pipeline architectures. The processor has been designed with parallel and pipeline architecture to attain high throughput, reduced hardware and maximum efficiency in all arithmetic elements. This architecture allows that the processing elements and arithmetic units work in parallel at half the frequency of the data input rate, except for normalization of transform which it is done in a multiplier operating at maximum frequency. Moreover, it has been verified that the precision analysis of the proposed processor meets the demands of IEEE Std. 1180-1990 used in video codecs ITU-T H.261 and ITU-T H.263. This processor has been conceived using a standard cell design methodology and manufactured in a 0.35-μm CMOS CSD 3M/2P 3.3V process. It has an area of 6.25 mm2 (the core is 3mm2) and contains a total of 11.7k gates, of which 5.8k gates are flip-flops. A data input rate frequency of 300MHz has been established with a latency of 172 cycles for the 2-D DCT and 178 cycles for the 2-D IDCT. The computing time of a block is close to 580ns. Its performances in computing speed as well as hardware complexity indicate that the proposed design is suitable for HDTV applications.

[1]  M.-J. Hsiao,et al.  Carry-select adder using single ripple-carry adder , 1998 .

[2]  Jiun-In Guo,et al.  An Efficient IDCT Processor Design for HDTV Applications , 2003, J. VLSI Signal Process..

[3]  Zhigang Cao,et al.  New cost-effective VLSI implementation of a 2-D discrete cosine transform and its inverse , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  藤田 哲也,et al.  A 0.9V 150MHz 10mW 4mm^2 2-D Discrete Cosine Transform Core Processor with Variable Threshold-Voltage (VT) Scheme , 1996 .

[5]  Han-Jin Cho,et al.  A design of 2-D DCT/IDCT for real-time video applications , 1999, ICVC '99. 6th International Conference on VLSI and CAD (Cat. No.99EX361).

[6]  Jun Rim Choi,et al.  A compatible DCT/IDCT architecture using hardwired distributed arithmetic , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[7]  Konstantinos Konstantinides,et al.  Image and video compression standards , 1995 .

[8]  Chein-Wei Jen,et al.  A simple processor core design for DCT/IDCT , 2000, IEEE Trans. Circuits Syst. Video Technol..

[9]  R. V. Prasad,et al.  Techniques and Standards for Image, Video and Audio Coding , 1998 .