Novel Structures for Cyclic Convolution Using Improved First-Order Moment Algorithm

This paper first presents a decomposition scheme to reduce the computation time and make the first-order moment-based cyclic convolution well suited for hardware implementation. By decomposing the fixed convolution kernel into similar subparts and using their preprocessing results as control signals, each subpart of cyclic convolution can be calculated with a basic computing substructure. Due to the flexibility of decomposition, a trade-off between computation time and hardware complexity exists. And for a pair of fixed decomposition coefficients, the similarity among subparts leads to the time-efficient structure and the area-efficient structure for cyclic convolution without limitation on the convolution length N and the word length L. Since the basic computing substructure only contains a simple control module, several circularly right-shift registers and N accumulation units, there is no requirement for multipliers and large memory. Comparisons in terms of area-delay product, area-time product and power consumption with the existing memory-based structures have been made to demonstrate the efficiency and effectiveness of the proposed structures. Using the same metrics, the comparison results further show significant improvement of the proposed designs over the previous first-order moment-based structure.

[1]  Pramod Kumar Meher,et al.  Parallel and Pipelined Architectures for Cyclic Convolution by Block Circulant Formulation Using Low-Complexity Short-Length Algorithms , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Thambipillai Srikanthan,et al.  Scalable and modular memory-based systolic architectures for discrete Hartley transform , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  C. Burrus,et al.  Fast one-dimensional digital convolution by multidimensional techniques , 1974 .

[4]  Keshab K. Parhi,et al.  Hardware efficient fast parallel FIR filter structures based on iterated short convolution , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[5]  M. N. Murty Realization of Prime-Length Discrete Sine Transform Using Cyclic Convolution , 2013 .

[6]  Okan K. Ersoy Semisystolic Array Implementation of Circular, Skew Circular, and Linear Convolutions , 1985, IEEE Transactions on Computers.

[7]  Chein-Wei Jen,et al.  Hardware-efficient DFT designs with cyclic convolution and subexpression sharing , 2000 .

[8]  Zhenbing Liu,et al.  Novel Convolutions Using First-Order Moments , 2012, IEEE Transactions on Computers.

[9]  C. Burrus,et al.  Number theoretic transforms to implement fast digital convolution , 1975 .

[10]  Chein-Wei Jen,et al.  Distributed arithmetic realisation of cyclic convolution and its DFT application , 2005 .

[11]  J. G. Liu,et al.  A fast algorithm for discrete sine transform using first-order moment , 2011, 2011 International Conference on Image Analysis and Signal Processing.

[12]  M. N. Shanmukha Swamy,et al.  High-Throughput Memory-Based Architecture for DHT Using a New Convolutional Formulation , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[13]  Pramod Kumar Meher Systolic Designs for DCT Using a Low-Complexity Concurrent Convolutional Formulation , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Pramod Kumar Meher,et al.  New Approach to Look-Up-Table Design and Memory-Based Realization of FIR Digital Filter , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Jiun-In Guo,et al.  An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  C. Burrus,et al.  Fast Convolution using fermat number transforms with applications to digital filtering , 1974 .

[17]  J. Cooley,et al.  New algorithms for digital convolution , 1977 .

[18]  K.K. Parhi,et al.  Hardware Efficient Fast DCT Based on Novel Cyclic Convolution Structures , 2006, IEEE Transactions on Signal Processing.

[19]  Domingo Rodríguez,et al.  A class of fast cyclic convolution algorithms based on block pseudocirculants , 1995, IEEE Signal Processing Letters.

[20]  Jiun-In Guo An efficient design for one-dimensional discrete Hartley transform using parallel additions , 2000, IEEE Trans. Signal Process..

[21]  Chein-Wei Jen,et al.  The efficient memory-based VLSI array designs for DFT and DCT , 1992 .

[22]  Pramod Kumar Meher,et al.  Hardware-Efficient Systolization of DA-Based Calculation of Finite Digital Convolution , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[23]  Jiun-In Guo,et al.  Efficient parallel adder based design for one-dimensional discrete cosine transform , 2000 .

[24]  Jian Zhang,et al.  High Speed Parallel Architecture for Cyclic Convolution Based on FNT , 2009, 2009 IEEE Computer Society Annual Symposium on VLSI.

[25]  M. Teixeira,et al.  Parallel Cyclic Convolution Based on Recursive Formulations of Block Pseudocirculant Matrices , 2008, IEEE Transactions on Signal Processing.

[26]  Jianjun He,et al.  Hardware-Efficient Realization of Prime-Length DCT Based on Distributed Arithmetic , 2013, IEEE Transactions on Computers.

[27]  M. Teixeira,et al.  A novel development for parallel cyclic convolution: The super block pseudocirculant matrix , 2007, 2007 IEEE Sarnoff Symposium.

[28]  Chein-Wei Jen,et al.  A memory efficient realization of cyclic convolution and its application to discrete cosine transform , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[29]  Thanos Stouraitis,et al.  Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[30]  John G. McWhirter,et al.  MULTIBIT CONVOLUTION USING A BIT LEVEL SYSTOLIC ARRAY. , 1985 .

[31]  H. Nussbaumer,et al.  Fast polynomial transform algorithms for digital convolution , 1980 .

[32]  H. T. Kung Why systolic architectures? , 1982, Computer.