Area- and power-efficient iterative single/double-precision merged floating-point multiplier on FPGA

In this study, an area and power-efficient iterative floating-point (FP) multiplier architecture is designed and implemented on FPGA devices with pipelined architecture. The proposed multiplier supports both single-precision (SP) and double-precision (DP) operations. The operation mode can be switched during run time by changing the precision selection signal. The Karatsuba algorithm is applied when mapping the mantissa multiplier in order to reduce the number of digital signal processing (DSP) blocks required. For DP operations, the iterative method is applied which require much less hardware than a fully pipelined DP multiplier and thus reduces the power consumption. To further reduce the power consumption, the unused logic blocks for a specific operation mode are disabled. Compared to previous work, the proposed multiplier can achieve 33% reduction of DSP blocks, 4.3% less look-up tables (LUTs), and 31.2% less flip-flops while having 4% faster clock frequency on Virtex-5 devices. Compared to the intellectual property core DP multiplier provided by the FPGA vendors, the proposed multiplier required less DSP blocks and achieves lower-power consumption. The mapping solutions and implementation results of the proposed multiplier on Xilinx Virtex-7 and Altera Arria-10 devices are also presented. In addition, the results of a direct implementation of the proposed architecture on STM-90 nm ASIC platform are reported.

[1]  Michael J. Schulte,et al.  Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support , 2009, IEEE Transactions on Computers.

[2]  Michael J. Flynn,et al.  Design Issues in Division and Other Floating-Point Operations , 1997, IEEE Trans. Computers.

[3]  Karl S. Hemmert,et al.  A comparison of floating point and logarithmic number systems for FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[4]  Seok-Bum Ko,et al.  Scalable Elliptic Curve Cryptosystem FPGA Processor for NIST Prime Curves , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[6]  Jeffrey S. Vetter,et al.  Accelerating scientific applications with the SRC-6 reconfigurable computer: methodologies and analysis , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[7]  Mandeep Chaudhary,et al.  An Improved Two-Step Binary Logarithmic Converter for FPGAs , 2015, IEEE Transactions on Circuits and Systems II: Express Briefs.

[8]  Michael J. Schulte,et al.  Memory latency consideration for load sharing on heterogeneous network of workstations , 2006 .

[9]  Florent de Dinechin,et al.  Multipliers for floating-point double precision and beyond on FPGAs , 2011, CARN.

[10]  JunKyu Lee,et al.  The Role of Precision for Iterative Refinement , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[11]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[12]  Mandeep Chaudhary,et al.  Two-stage logarithmic converter with reduced memory requirements , 2014, IET Comput. Digit. Tech..

[13]  Ray C. C. Cheung,et al.  Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support , 2013, Microelectron. J..

[14]  Dionysios I. Reisis,et al.  An efficient multiple precision floating-point multiplier , 2011, 2011 18th IEEE International Conference on Electronics, Circuits, and Systems.

[15]  Shiann-Rong Kuang,et al.  Variable-Latency Floating-Point Multipliers for Low-Power Applications , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  David H. Bailey High-precision computation: Applications and challenges [Keynote I] , 2013, IEEE Symposium on Computer Arithmetic.

[17]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[18]  Florent de Dinechin,et al.  Large multipliers with fewer DSP blocks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[19]  S. Samaan,et al.  A 0.18 /spl mu/m CMOS IA32 microprocessor with a 4 GHz integer execution unit , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[20]  Wayne Luk,et al.  Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.