An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply Operator

Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance. We investigate techniques to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the proposed technique yields considerable reductions in terms of critical delay, hardware complexity and power consumption of the FAM unit.

[1]  David W. Matula,et al.  A Booth multiplier accepting both a redundant or a non redundant input with no additional delay , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[2]  Oscal T.-C. Chen,et al.  A multiplication-accumulation computation unit with optimized compressors and minimized switching activities , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[3]  Efstathios D. Kyriakis-Bitzaros,et al.  Estimation of signal transition activity in FIR filters implementedby a MAC architecture , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Chein-Wei Jen,et al.  High-speed and low-power split-radix FFT , 2003, IEEE Trans. Signal Process..

[5]  Milos D. Ercegovac,et al.  High-performance low-power left-to-right array multiplier design , 2005, IEEE Transactions on Computers.

[6]  Mircea Vladutiu,et al.  Design Issues and Implementations for Floating-Point Divide–Add Fused , 2010, IEEE Transactions on Circuits and Systems II: Express Briefs.

[7]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[8]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[9]  Milos D. Ercegovac,et al.  High-level optimization techniques for low-power multiplier design , 2003 .

[10]  Kiamal Z. Pekmestzi,et al.  Flexible Datapath Synthesis through Arithmetically Optimized Operation Chaining , 2009, 2009 NASA/ESA Conference on Adaptive Hardware and Systems.

[11]  O. L. Macsorley High-Speed Arithmetic in Binary Computers , 1961, Proceedings of the IRE.

[12]  Dong-Wook Kim,et al.  A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Giovanni De Micheli,et al.  Using symbolic algebra in algorithmic level DSP synthesis , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[14]  Reto Zimmermann,et al.  Optimized synthesis of sum-of-products , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[15]  Javier D. Bruguera,et al.  Implementation of the FFT butterfly with redundant arithmetic , 1996 .

[16]  David W. Matula,et al.  Redundant binary Booth recoding , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[17]  Kevin Nowka,et al.  A 16-Bit by 16-Bit MAC Design Using Fast 5:3 Compressor Cells , 2002, J. VLSI Signal Process..

[18]  Joseph Cavanagh,et al.  Digital Computer Arithmetic , 1983 .

[19]  Earl E. Swartzlander,et al.  FFT Implementation with Fused Floating-Point Operations , 2012, IEEE Transactions on Computers.