Array Multipliers for High Throughput in Xilinx FPGAs with 6-Input LUTs

Multiplication is the dominant operation for many applications implemented on field-programmable gate arrays (FPGAs). Although most current FPGA families have embedded hard multipliers, soft multipliers using lookup tables (LUTs) in the logic fabric remain important. This paper presents a novel two-operand addition circuit (patent pending) that combines radix-4 partial-product generation with addition and shows how it can be used to implement two’s-complement array multipliers. The circuit is specific to modern Xilinx FPGAs that are based on a 6-input LUT architecture. Proposed pipelined multipliers use 42%–52% fewer LUTs, and some versions can be clocked up to 23% faster than delay-optimized LogiCORE IP multipliers. This allows 1.72–2.10-times as many multipliers to be implemented in the same logic fabric and potentially offers 1.86–2.58-times the throughput by increasing the clock frequency.

[1]  E. J. King,et al.  Data-dependent truncation scheme for parallel multipliers , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[2]  E. Swartzlander Merged Arithmetic , 1980, IEEE Transactions on Computers.

[3]  Florent de Dinechin,et al.  Arithmetic core generation using bit heaps , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[4]  Peter Zipf,et al.  Pipelined compressor tree optimization using integer linear programming , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Peter Zipf,et al.  An Efficient Softcore Multiplier Architecture for Xilinx FPGAs , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[6]  Paolo Ienne,et al.  Measuring and Reducing the Performance Gap between Embedded and Soft Multipliers on FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[7]  Yusuke Matsunaga,et al.  Multi-operand adder synthesis on FPGAs using generalized parallel counters , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[8]  Florent de Dinechin,et al.  Large multipliers with fewer DSP blocks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[9]  E. George Walters Partial-product generation and addition for multiplication in FPGAs with 6-input LUTs , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[10]  Paolo Ienne,et al.  Improving FPGA Performance for Carry-Save Arithmetic , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Yusuke Matsunaga,et al.  An Exact Approach for GPC-Based Compressor Tree Synthesis , 2013, IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences.

[12]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[13]  Khaldoon M. Mhaidat,et al.  A new efficient reduction scheme to implement tree multipliers on FPGAs , 2014, 2014 9th International Design and Test Symposium (IDT).

[14]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[15]  Paolo Ienne,et al.  Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming , 2008, 2008 Design, Automation and Test in Europe.

[16]  E. George Walters Linear and Quadratic Interpolators Using Truncated-Matrix Multipliers and Squarers , 2015, Comput..

[17]  Paolo Ienne,et al.  Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[18]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[19]  Yusuke Matsunaga,et al.  Power and delay aware synthesis of multi-operand adders targeting LUT-based FPGAs , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[20]  William J. Kubitz,et al.  A Compact High-Speed Parallel Multiplication Scheme , 1977, IEEE Transactions on Computers.

[21]  S. Gao,et al.  Implementation of large size multipliers using ternary adders and higher order compressors , 2009, 2009 International Conference on Microelectronics - ICM.

[22]  Paolo Ienne,et al.  Exploiting fast carry-chains of FPGAs for designing compressor trees , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[23]  E. George Iii. Walters Using truncated-matrix multipliers and squarers in high-performance DSP systems , 2009 .

[24]  Paolo Ienne,et al.  Efficient synthesis of compressor trees on FPGAs , 2008, 2008 Asia and South Pacific Design Automation Conference.

[25]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[26]  O. L. Macsorley High-Speed Arithmetic in Binary Computers , 1961, Proceedings of the IRE.

[27]  Michael J. Schulte,et al.  Design tradeoffs using truncated multipliers in FIR filter implementations , 2002, SPIE Optics + Photonics.

[28]  Michael J. Schulte,et al.  Using truncated multipliers in DCT and IDCT hardware accelerators , 2003, SPIE Optics + Photonics.