Performance effects of pipeline architecture on an FPGA-based binary32 floating point multiplier

High pipeline depth architecture with pipeline stage more than five is rarely adopted in existing multipliers for real world applications. In this paper, a field programmable gate array (FPGA) based binary32 floating point multiplier (FPM) is presented to support variety of pipeline depth and the effects of pipeline architecture have been investigated. Pipeline architecture is formulated based on radix-4 Booth recoding approach, an improved Wallace tree, and partial product accumulation. Upon detail and quantitative investigation on the proposed architecture on both cutting edge Xilinx and Altera devices, pipeline depth affects maximum running frequency much more than power consumption, and the pipeline depth should be limited to obtain maximum running frequency for binary32 FPM on both cutting edge target devices, which is consistent to the previous study. Meanwhile, the study demonstrates the pipeline depth to reach at peak performance is lower than that of targeting at FPGA device with 4-input LUTs years ago.

[1]  Aziza I. Hussein,et al.  High-speed, area-efficient FPGA-based floating-point multiplier , 2003, Proceedings of the 12th IEEE International Conference on Fuzzy Systems (Cat. No.03CH37442).

[2]  R. M. Banakar,et al.  Design of High-Speed Floating Point Multiplier , 2008, 4th IEEE International Symposium on Electronic Design, Test and Applications (delta 2008).

[3]  Yogesh Kumar,et al.  Clock-less Design for Reconfigurable Floating Point Multiplier , 2011, 2011 Third International Conference on Computational Intelligence, Modelling & Simulation.

[4]  E.M. Saad,et al.  High-Speed, Area-Efficient FPGA-Based Floating-Point Arithmetic Modules , 2007, 2007 National Radio Science Conference.

[5]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Earl E. Swartzlander,et al.  Bridge Floating-Point Fused Multiply-Add Design , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[8]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[9]  Sri Parameswaran,et al.  Configurable Multimode Embedded Floating-Point Units for FPGAs , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  G. Marcus,et al.  A fully synthesizable single-precision, floating-point adder/substractor and multiplier in VHDL for general and educational use , 2004, Proceedings of the Fifth IEEE International Caracas Conference on Devices, Circuits and Systems, 2004..

[11]  Ashraf Salem,et al.  An efficient implementation of floating point multiplier , 2011, 2011 Saudi International Electronics, Communications and Photonics Conference (SIECPC).

[12]  Michael J. Schulte,et al.  Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support , 2009, IEEE Transactions on Computers.

[13]  Xie Lingling,et al.  Hardware implementation of a high speed floating point multiplier based on FPGA , 2009, 2009 4th International Conference on Computer Science & Education.

[14]  Shiann-Rong Kuang,et al.  Variable-Latency Floating-Point Multipliers for Low-Power Applications , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Hassan El-Ghitani,et al.  Design of Generic Floating Point Multiplier and Adder/Subtractor Units , 2010, 2010 12th International Conference on Computer Modelling and Simulation.

[16]  Dake Liu,et al.  High-performance, low-latency field-programmable gate array-based floating-point adder and multiplier units in a Virtex 4 , 2008, IET Comput. Digit. Tech..