Hardware Efficient Fast DCT Based on Novel Cyclic Convolution Structures

Cyclic convolution is a widely used operation in signal processing. In very large-scale integration (VLSI) design, it is usually implemented with systolic array and distributed arithmetic; however, these implementation designs may not be fast enough or use too much hardware cost when the convolution length is large. This paper presents a new fast cyclic convolution algorithm, which is hardware efficient and suitable for high-speed VLSI implementation, especially when the convolution length is large. For example, when the proposed fast cyclic convolution algorithm is applied to the implementation of prime length discrete cosine transform (DCT), the proposed high-throughput implementation of 1297-length DCT design saves 1216 (94%) multiplications, 282 (22%) additions, and 4792 (74%) delay elements compared with those of recently proposed systolic array based algorithms. Furthermore, the proposed algorithm can run at a speed that is 1.5 times that of previous designs and requires less I/O cost as long as the wordlength L is less than 20 bits

[1]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .

[2]  Chein-Wei Jen,et al.  A new systolic array algorithm for discrete Fourier transform , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[3]  H. T. Kung Why systolic architectures? , 1982, Computer.

[4]  Chein-Wei Jen,et al.  Hardware-efficient DFT designs with cyclic convolution and subexpression sharing , 2000 .

[5]  Keshab K. Parhi,et al.  Hardware efficient fast parallel FIR filter structures based on iterated short convolution , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[6]  H. Nussbaumer Fast Fourier transform and convolution algorithms , 1981 .

[7]  Domingo Rodríguez,et al.  A class of fast cyclic convolution algorithms based on block pseudocirculants , 1995, IEEE Signal Processing Letters.

[8]  Chein-Wei Jen,et al.  A New Array Architecture for Prime-Length Discrete Cosine Transform , 1993, IEEE Trans. Signal Process..

[9]  Chein-Wei Jen,et al.  A memory-efficient realization of cyclic convolution and its application to discrete cosine transform , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Doru Florin Chiper Novel systolic array design for discrete cosine transform with high throughput rate , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[11]  T. Parks,et al.  A prime factor FFT algorithm using high-speed convolution , 1977 .

[12]  Thanos Stouraitis,et al.  A systolic array architecture for the discrete sine transform , 2002, IEEE Trans. Signal Process..

[13]  Keshab K. Parhi,et al.  A novel systolic array structure for DCT , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[14]  J. Cooley,et al.  New algorithms for digital convolution , 1977 .

[15]  Algorithm for p/sup m/-length discrete cosine transform , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[16]  Chein-Wei Jen,et al.  The efficient memory-based VLSI array designs for DFT and DCT , 1992 .