论文信息 - Array Multipliers for High Throughput in Xilinx FPGAs with 6-Input LUTs

Array Multipliers for High Throughput in Xilinx FPGAs with 6-Input LUTs

Multiplication is the dominant operation for many applications implemented on field-programmable gate arrays (FPGAs). Although most current FPGA families have embedded hard multipliers, soft multipliers using lookup tables (LUTs) in the logic fabric remain important. This paper presents a novel two-operand addition circuit (patent pending) that combines radix-4 partial-product generation with addition and shows how it can be used to implement two’s-complement array multipliers. The circuit is specific to modern Xilinx FPGAs that are based on a 6-input LUT architecture. Proposed pipelined multipliers use 42%–52% fewer LUTs, and some versions can be clocked up to 23% faster than delay-optimized LogiCORE IP multipliers. This allows 1.72–2.10-times as many multipliers to be implemented in the same logic fabric and potentially offers 1.86–2.58-times the throughput by increasing the clock frequency.

E. George Walters

[1] E. J. King,et al. Data-dependent truncation scheme for parallel multipliers , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[2] E. Swartzlander. Merged Arithmetic , 1980, IEEE Transactions on Computers.

[3] Florent de Dinechin,et al. Arithmetic core generation using bit heaps , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[4] Peter Zipf,et al. Pipelined compressor tree optimization using integer linear programming , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[5] Peter Zipf,et al. An Efficient Softcore Multiplier Architecture for Xilinx FPGAs , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[6] Paolo Ienne,et al. Measuring and Reducing the Performance Gap between Embedded and Soft Multipliers on FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[7] Yusuke Matsunaga,et al. Multi-operand adder synthesis on FPGAs using generalized parallel counters , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[8] Florent de Dinechin,et al. Large multipliers with fewer DSP blocks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[9] E. George Walters. Partial-product generation and addition for multiplication in FPGAs with 6-input LUTs , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[10] Paolo Ienne,et al. Improving FPGA Performance for Carry-Save Arithmetic , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11] Yusuke Matsunaga,et al. An Exact Approach for GPC-Based Compressor Tree Synthesis , 2013, IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences.

[12] Behrooz Parhami,et al. Computer arithmetic - algorithms and hardware designs , 1999 .

[13] Khaldoon M. Mhaidat,et al. A new efficient reduction scheme to implement tree multipliers on FPGAs , 2014, 2014 9th International Design and Test Symposium (IDT).

[14] Christopher S. Wallace,et al. A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[15] Paolo Ienne,et al. Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming , 2008, 2008 Design, Automation and Test in Europe.

[16] E. George Walters. Linear and Quadratic Interpolators Using Truncated-Matrix Multipliers and Squarers , 2015, Comput..

[17] Paolo Ienne,et al. Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[18] Florent de Dinechin,et al. Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[19] Yusuke Matsunaga,et al. Power and delay aware synthesis of multi-operand adders targeting LUT-based FPGAs , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[20] William J. Kubitz,et al. A Compact High-Speed Parallel Multiplication Scheme , 1977, IEEE Transactions on Computers.

[21] S. Gao,et al. Implementation of large size multipliers using ternary adders and higher order compressors , 2009, 2009 International Conference on Microelectronics - ICM.

[22] Paolo Ienne,et al. Exploiting fast carry-chains of FPGAs for designing compressor trees , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[23] E. George Iii. Walters. Using truncated-matrix multipliers and squarers in high-performance DSP systems , 2009 .

[24] Paolo Ienne,et al. Efficient synthesis of compressor trees on FPGAs , 2008, 2008 Asia and South Pacific Design Automation Conference.

[25] Milos D. Ercegovac,et al. Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[26] O. L. Macsorley. High-Speed Arithmetic in Binary Computers , 1961, Proceedings of the IRE.

[27] Michael J. Schulte,et al. Design tradeoffs using truncated multipliers in FIR filter implementations , 2002, SPIE Optics + Photonics.

[28] Michael J. Schulte,et al. Using truncated multipliers in DCT and IDCT hardware accelerators , 2003, SPIE Optics + Photonics.