Efficient Scheme for Implementing Large Size Signed Multipliers Using Multigranular Embedded DSP Blocks in FPGAs

Modern FPGAs contain embedded DSP blocks, which can be configured as multipliers with more than one possible size. FPGA-based designs using these multigranular embedded blocks become more challenging when high speed and reduced area utilization are required. This paper proposes an efficient design methodology for implementing large size signed multipliers using multigranular small embedded blocks. The proposed approach has been implemented and tested targeting Altera's Stratix II FPGAs with the aid of the Quartus II software tool. The implementations of the multipliers have been carried out for operands with sizes ranging from 40 to 256 bits. Experimental results demonstrated that our design approach has outperformed the standard scheme used by Quartus II tool in terms of speed and area. On average, the delay reduction is about 20.7% and the area saving, in terms of ALUTs, is about 67.6%.

[1]  Paolo Ienne,et al.  Efficient synthesis of compressor trees on FPGAs , 2008, 2008 Asia and South Pacific Design Automation Conference.

[2]  N. Nedjah,et al.  A reconfigurable recursive and efficient hardware for Karatsuba-Ofman's multiplication algorithm , 2003, Proceedings of 2003 IEEE Conference on Control Applications, 2003. CCA 2003..

[3]  Shuli Gao,et al.  Optimized realization of large-size two’s complement multipliers on FPGAs , 2007, 2007 IEEE Northeast Workshop on Circuits and Systems.

[4]  Gang Quan,et al.  High-level synthesis for large bit-width multipliers on FPGAs: a case study , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[5]  Neil Burgess,et al.  Improved small multiplier based multiplication, squaring and division , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[6]  Bruce A. Wooley,et al.  A Two's Complement Parallel Array Multiplication Algorithm , 1973, IEEE Transactions on Computers.

[7]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Chang Woo Kang,et al.  Implementation of a 256-bit wideword processor for the data-intensive architecture (DIVA) processing-in-memory (PIM) chip , 2002, Proceedings of the 28th European Solid-State Circuits Conference.

[9]  Michael J. Schulte,et al.  A quadruple precision and dual double precision floating-point multiplier , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[10]  Israel Koren Computer arithmetic algorithms , 1993 .

[11]  Dhamin Al-Khalili,et al.  Optimised realisations of large integer multipliers and squarers using embedded blocks , 2007, IET Comput. Digit. Tech..

[12]  Rafael Fried Minimizing energy dissipation in high-speed multipliers , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[13]  Shuli Gao,et al.  256×256-bit multiplier using multi-granular embedded DSP blocks in FPGAs , 2008, 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference.