Area-Efficient Architectures for Large Integer and Quadruple Precision Floating Point Multipliers

Large integer multiplication and floating point multiplication are the two dominating operations for many scientific and cryptographic applications. Large integer multipliers generally have linearly but high area requirement according to a given bit-width. High precision requirements of a given application lead to the use of quadruple precision arithmetic, however its operation is dominated by large integer multiplication of the mantissa product. In this paper, we propose a hardware efficient approach for implementing a fully pipelined large integer multipliers, and further extending it to Quadruple Precision (QP) floating point multiplication. The proposed design uses less hardware resources in terms of DSP48 blocks and slices, while attaining high performance. Promising results are obtained when compared our designs with the best reported large integer multipliers and also QP floating point multiplier in literatures. For instance, our results have demonstrated a significant improvement for the proposed QP multiplier, for over 50% improvement in terms of the DSP48 block usage with a penalty of slight additional slices, when compared to the best result in the literature on a Virtex-4 device.

[1]  Nader Bagherzadeh,et al.  A Reconfigurable Architecture for Wireless Communication Systems , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[2]  Wayne Luk,et al.  A Karatsuba-Based Montgomery Multiplier , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[3]  K. Clint Slatton,et al.  Reconfigurable computing with multiscale data fusion for remote sensing , 2006, FPGA '06.

[4]  Florent de Dinechin,et al.  Multipliers for floating-point double precision and beyond on FPGAs , 2011, CARN.

[5]  Michael J. Schulte,et al.  Dual-mode floating-point multiplier architectures with parallel operations , 2006, J. Syst. Archit..

[6]  Michael J. Schulte,et al.  Memory latency consideration for load sharing on heterogeneous network of workstations , 2006 .

[7]  Nitin Chandrachoodan,et al.  FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture , 2012, IEEE Transactions on Computers.

[8]  Ahmet Akkas Dual-mode floating-point adder architectures , 2008, J. Syst. Archit..

[9]  K. Perez Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment , 2014 .

[10]  Florent de Dinechin,et al.  Large multipliers with fewer DSP blocks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[11]  Florent de Dinechin,et al.  High precision numerical accuracy in physics research , 2006 .

[12]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .