A new time distributed DCT architecture for MPEG-4 hardware reference model

This paper presents the design of a new time distributed architecture (TDA) which outlines the architecture (ISO/IEC JTC1/SC29/WG11 MPEG2002/M8565) submitted to MPEG4 Part9 committee and included in the ISO/IEC JTC1/SC29/WG11 MPEG2002/9115N document. The proposed TDA optimizes the two-dimensional discrete cosine transform (2-D-DCT) architecture performance. It uses a time distribution mechanism to exploit the computational redundancy within the inner product computation module. The application specific requirements of input, output and coefficients word length are met by scheduling the input data. The coefficient matrix uses linear mappings to assign necessary computation to processor elements in both space and time domains. The performance analysis shows performance savings in excess of 96% as compared to the direct implementation and more than 71% as compared to other optimized application specific architectures for DCT.

[1]  Earl E. Swartzlander,et al.  DCT Implementation with Distributed Arithmetic , 2001, IEEE Trans. Computers.

[2]  Ting Chen,et al.  VLSI implementation of a 16*16 discrete cosine transform , 1989 .

[3]  Earl E. Swartzlander,et al.  A scaled DCT architecture with the CORDIC algorithm , 2002, IEEE Trans. Signal Process..

[4]  Chein-Wei Jen,et al.  A simple processor core design for DCT/IDCT , 2000, IEEE Trans. Circuits Syst. Video Technol..

[5]  W. Ma,et al.  2-D DCT systolic array implementation , 1991 .

[6]  Magdy A. Bayoumi,et al.  A low power high performance distributed DCT architecture , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[7]  Peter A. Ruetz,et al.  A high-performance full-motion video compression chip set , 1992, IEEE Trans. Circuits Syst. Video Technol..

[8]  Yu-Tai Chang,et al.  A new fast DCT algorithm and its systolic VLSI implementation , 1997 .

[9]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[10]  H. C. Karathanasis,et al.  A low ROM distributed arithmetic implementation of the forward/inverse DCT/DST using rotations , 1995 .

[11]  Chin-Liang Wang,et al.  New systolic array implementation of the 2-D discrete cosine transform and its inverse , 1995, IEEE Trans. Circuits Syst. Video Technol..

[12]  R. Clarke,et al.  Relation between the Karhunen Loève and cosine transforms , 1981 .

[13]  Masahiko Yoshimoto,et al.  A 100-MHz 2-D discrete cosine transform core processor , 1992 .

[14]  Alan N. Willson,et al.  A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications , 1995, IEEE Trans. Circuits Syst. Video Technol..

[15]  W. Siu,et al.  On the realization of discrete cosine transform using the distributed arithmetic , 1992 .

[16]  Liang-Gee Chen,et al.  High throughput CORDIC-based systolic array design for the discrete cosine transform , 1995, IEEE Trans. Circuits Syst. Video Technol..

[17]  Weiping Li,et al.  DCT/IDCT processor design for high data rate image coding , 1992, IEEE Trans. Circuits Syst. Video Technol..