High-speed FFT processors based on redundant number systems

Fast Fourier Transform (FFT) processors, having a significant impact on the performance of communication systems, have been a hot topic of research for many years. FFT function consists of consecutive multiply-add operations over complex numbers, dubbed as butterfly units. Use of redundant number systems is a way of increasing the speed of FFT coprocessors. It eliminates carry-propagation and hence permits latency reduction of each stage of the pipelined FFT architecture. This paper proposes a high-speed FFT processor using the devised fused-dot-product-add (FDPA) unit, to compute AB ± CD ± E, based on Binary-Signed-Digit (BSD) representation. Three-operand BSD adder and BSD constant multiplier are the constituents of the proposed FDPA unit. A carry-limited BSD adder is proposed and used in the three-operand adder and in the parallel BSD multiplier, so as to improve the speed of the FDPA unit. Moreover, modified-booth encoding is used to accelerate the BSD multiplier. Synthesis results show that the proposed design is about two times faster than the best previous work; but at cost of more area/power consumption.

[1]  Amir Kaivani,et al.  Improved design of high-frequency sequential decimal multipliers , 2014 .

[2]  Behrooz Parhami,et al.  Generalized Signed-Digit Number Systems: A Unifying Framework for Redundant Number Representations , 1990, IEEE Trans. Computers.

[3]  Behrooz Parhami Tight Upper Bounds on the Minimum Precision Required of the Divisor and the Partial Remainder in High-Radix Division , 2003, IEEE Trans. Computers.

[4]  Luigi Ciminiera,et al.  Over Redundant Digit Sets and the Design of Digit-by-Digit Division Units , 1994, IEEE Trans. Computers.

[5]  Silvia M. Müller,et al.  The IBM zEnterprise-196 Decimal Floating-Point Accelerator , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[6]  Earl E. Swartzlander,et al.  FFT Implementation with Fused Floating-Point Operations , 2012, IEEE Transactions on Computers.

[7]  Michael J. Schulte,et al.  A high-frequency decimal multiplier , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[8]  Fred J. Taylor,et al.  A Radix-4 FFT Using Complex RNS Arithmetic , 1985, IEEE Transactions on Computers.

[9]  M. Bayoumi,et al.  Algorithms for Energy-Efficient Query-Reduction in Wireless Sensor Networks , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[10]  Paolo Montuschi,et al.  A radix-10 SRT divider based on alternative BCD codings , 2007, 2007 25th International Conference on Computer Design.

[11]  P. Duhamel,et al.  `Split radix' FFT algorithm , 1984 .

[12]  Mohammad Ghodsi,et al.  Weighted two-valued digit-set encodings: unifying efficient hardware representation schemes for redundant number systems , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  David E. Shaw,et al.  Tight Certification Techniques for Digit-by-Rounding Algorithms with Application to a New 1/sqrt(x) Design , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[14]  R. K. Richards,et al.  Arithmetic operations in digital computers , 2013 .

[15]  Amir Kaivani,et al.  Decimal Division Algorithms: The Issue of Partial Remainders , 2013, J. Signal Process. Syst..

[16]  Milos D. Ercegovac,et al.  A higher-radix division with simple selection of quotient digits , 1983, 1983 IEEE 6th Symposium on Computer Arithmetic (ARITH).

[17]  Ghassem Jaberipur,et al.  A Family of High Radix Signed Digit Adders , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[18]  Tomás Lang,et al.  A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture , 2007, IEEE Transactions on Computers.

[19]  Li Chen,et al.  High-frequency sequential decimal multipliers , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[20]  B. Parhami,et al.  Precision requirements for quotient digit selection in high-radix division , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[21]  E. Dubois,et al.  A new algorithm for the radix-3 FFT , 1978 .

[22]  Peter Kornerup Digit selection for SRT division and square root , 2005, IEEE Transactions on Computers.

[23]  Javier D. Bruguera,et al.  Implementation of the FFT butterfly with redundant arithmetic , 1996 .

[24]  Amir Kaivani,et al.  Decimal SRT Square Root: Algorithm and Architecture , 2013, Circuits Syst. Signal Process..

[25]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[26]  E. Swartzlander,et al.  Floating-point implementation of complex multiplication , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[27]  Mit Press,et al.  A Linear Filtering Approach to the Computation of the Discrete Fourier Transform , 1969 .

[28]  Julio Villalba,et al.  Computation of Decimal Transcendental Functions Using the CORDIC Algorithm , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[29]  Michael J. Schulte,et al.  Decimal multiplication via carry-save addition , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[30]  Amir Kaivani,et al.  Decimal signed digit addition using stored transfer encoding , 2013, 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[31]  Sau-Gee Chen,et al.  A High-Throughput Radix-16 FFT Processor With Parallel and Normal Input/Output Ordering for IEEE 802.15.3c Systems , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[32]  E.E. Swartzlander,et al.  Fused floating-point arithmetic for DSP , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[33]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[34]  Paolo Montuschi,et al.  A New Family of High.Performance Parallel Decimal Multipliers , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[35]  Michael J. Flynn,et al.  The case for a redundant format in floating point arithmetic , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[36]  Amir Kaivani,et al.  Fully redundant decimal addition and subtraction using stored-unibit encoding , 2010, Integr..

[37]  Peter Kornerup Correcting the Normalization Shift of Redundant Binary Representations , 2009, IEEE Transactions on Computers.

[38]  Earl E. Swartzlander,et al.  A floating-point fused FFT butterfly arithmetic unit with Merged Multiple-Constant Multipliers , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[39]  Antonin Svoboda Decimal Adder with Signed Digit Arithmetic , 1969, IEEE Transactions on Computers.

[40]  Amir Kaivani,et al.  Improving the speed of decimal division , 2011, IET Comput. Digit. Tech..

[41]  Amir Kaivani,et al.  Area Efficient Sequential Decimal Fixed-point Multiplier , 2014, J. Signal Process. Syst..

[42]  B. Parhami,et al.  A class of stored-transfer representations for redundant number systems , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[43]  David E. Shaw,et al.  Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[44]  Amir Kaivani,et al.  Decimal CORDIC Rotation based on Selection by Rounding: Algorithm and Architecture , 2011, Comput. J..

[45]  B. Parhami,et al.  Weighted bit-set encodings for redundant digit sets: theory and applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[46]  Michael J. Schulte,et al.  Decimal floating-point square root using Newton-Raphson iteration , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[47]  Amir Kaivani,et al.  Binary-coded decimal digit multipliers , 2007, IET Comput. Digit. Tech..

[48]  Eric M. Schwarz,et al.  IBM POWER6 accelerators: VMX and DFU , 2007, IBM J. Res. Dev..

[49]  Alexandre F. Tenca,et al.  Multi-operand Floating-Point Addition , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[50]  R. Singleton An algorithm for computing the mixed radix fast Fourier transform , 1969 .

[51]  Michael F. Cowlishaw,et al.  Decimal floating-point: algorism for computers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[52]  Vojin G. Oklobdzija,et al.  An algorithmic and novel design of a leading zero detector circuit: comparison with logic synthesis , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[53]  Behrooz Parhami,et al.  On the Implementation of Arithmetic Support Functions for Generalized Signed-Digit Number Systems , 1993, IEEE Trans. Computers.

[54]  S. Winograd On computing the Discrete Fourier Transform. , 1976, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Guy Even,et al.  An IEEE Compliant Floating-Point Adder that Conforms with the Pipelined Packet-Forwarding Paradigm , 2000, IEEE Trans. Computers.

[56]  Amir Kaivani,et al.  Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[57]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[58]  Milos D. Ercegovac,et al.  Design and FPGA implementation of radix-10 algorithm for square root with limited precision primitives , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[59]  Paolo Montuschi,et al.  Improved Design of High-Performance Parallel Decimal Multipliers , 2010, IEEE Transactions on Computers.

[60]  Eric M. Schwarz,et al.  Decimal floating-point support on the IBM System z10 processor , 2009, IBM J. Res. Dev..

[61]  David Y. Y. Yun,et al.  RBCD: redundant binary coded decimal adder , 1989 .

[62]  Earl E. Swartzlander,et al.  Improved Architectures for a Floating-Point Fused Dot Product Unit , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.

[63]  Michael J. Schulte,et al.  Decimal multiplication with efficient partial product generation , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[64]  L. Bluestein A linear filtering approach to the computation of discrete Fourier transform , 1970 .

[65]  Eric M. Schwarz,et al.  Power6 Decimal Divide , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[66]  Gao Deyuan,et al.  Three-Operand Floating-Point Adder , 2012, 2012 IEEE 12th International Conference on Computer and Information Technology.

[67]  Ghassem Jaberipur,et al.  Fully Redundant Decimal Arithmetic , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.