Optimal VLSI complexity design for high speed pipeline FFT using RNS

In this study a logic design, based on RNS units, to perform the N point FFT on a continuous data stream is proposed, and its performance is evaluated in terms of asymptotic VLSI complexity. Such a structure is based on quadratic residue number systems (QRNS), which allow a simplified processing of complex numbers. It is known in the literature that a lower bound on complexity for a single instance of the FFT problem is A(N)T2(N)=Ω(N2 log2 N), and that optimal constructive designs were proposed with T(N)=ϑ(log2 N). The architecture proposed here is based on a very high degree of processing parallelism and on a communication parallelism tailored to the response time of adders and multipliers used in the design; furthermore, pipelining data it performs as a single FFT instance optimal design and features an upper bound for the mean service time Tm(N)=ϑ(log log log N) for each FFT instance. The approach has been also applied in the design of a structure performing 1024-point FFT, with 23 bit data.

[1]  Bongiovanni,et al.  A VLSI Network for Variable Size FFT's , 1983, IEEE Transactions on Computers.

[2]  Fred J. Taylor,et al.  A Radix-4 FFT Using Complex RNS Arithmetic , 1985, IEEE Transactions on Computers.

[3]  Fayez El Guibaly,et al.  VLSI design of an FFT processor network , 1989, Integr..

[4]  Giuseppe Alia,et al.  On the Lower Bound to the VLSI Complexity of Number Conversion from Weighted to Residue Representation , 1993, IEEE Trans. Computers.

[5]  Richard I. Tanaka,et al.  Residue arithmetic and its applications to computer technology , 1967 .

[6]  Alan Norton,et al.  Parallelization and Performance Analysis of the Cooley–Tukey FFT Algorithm for Shared-Memory Architectures , 1987, IEEE Transactions on Computers.

[7]  Giancarlo Bongiovanni Two VLSI Structures for the Discrete Fourier Transform , 1983, IEEE Transactions on Computers.

[8]  Michael A. Soderstrand,et al.  The concept of a Quadratic-Like " Complex Residue , 2002 .

[9]  Earl E. Swartzlander,et al.  A radix-8 wafer scale FFT processor , 1992, J. VLSI Signal Process..

[10]  Giuseppe Alia,et al.  A VLSI Modulo m Multiplier , 1991, IEEE Trans. Computers.

[11]  Franco P. Preparata,et al.  Area-Time Optimal VLSI Networks for Computing Integer Multiplications and Discrete Fourier Transform , 1981, ICALP.

[12]  Jacques Verly,et al.  An algorithm for distributed computation of FFTs , 1987 .

[13]  F J Taylor,et al.  Comparison of DFT algorithms using a residue architecture , 1981 .

[14]  Fred J. Taylor,et al.  A Residue Arithmetic Implementation of the FFT , 1987, J. Parallel Distributed Comput..

[15]  Alan R. Jones,et al.  Fast Fourier Transform , 1970, SIGP.

[16]  Giuseppe Alia,et al.  A fast near optimum VLSI implementation of FFT using residue number systems , 1984, Integr..

[17]  C. Thomborson,et al.  A Complexity Theory for VLSI , 1980 .

[18]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[19]  R. Jarocki,et al.  Modular architecture for high performance implementation of FFT algorithm , 1986, ISCA 1986.

[20]  Ramdas Kumaresan,et al.  Fast Base Extension Using a Redundant Modulus in RNS , 1989, IEEE Trans. Computers.

[21]  K.-W. Shin,et al.  A VLSI architecture for parallel computation of FFT , 1990 .

[22]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).