论文信息 - A Fully Pipelined Modular Multiple Precision Floating Point Multiplier with Vector Support

A Fully Pipelined Modular Multiple Precision Floating Point Multiplier with Vector Support

The rapid evolution of reconfigurable computing places a great demand for Floating Point Multipliers (FPMs) capable of supporting wide range of application domains from scientific computing to multimedia applications. While former needs the support of higher precision formats like Double Precision(DP) / Extended Precision(EP), the latter needs Single Instruction Multiple Data (SIMD) feature in Single Precision (SP) mode. This paper presents the design of an FPM catering to both the needs using a hierarchical design approach. The FPM supports nine parallel SP multiplications every cycle with a latency of two cycles and one DP/EP multiplication every cycle with a latency of three cycles. The FPM is architected to support all four IEEE rounding modes. Compared to other FPMs that support multiple precision and SIMD processing, our FPM achieves 9x throughput for vectored SP mode without penalising the throughput for DP/EP modes. This improvement in performance is achieved at a modest cost of 30 percent more area and 11 percent more power. The modular architecture of the proposed FPM results in significant power reduction upto 80 percent for scalar SP mode.

S. K. Nandy | S. Balakrishnan | Farhad Merchant | Alok Baluni

[1] Samir Palnitkar,et al. Verilog HDL: a guide to digital design and synthesis , 1996 .

[2] Akhilesh Tyagi,et al. A Reduced-Area Scheme for Carry-Select Adders , 1993, IEEE Trans. Computers.

[3] Michael J. Schulte,et al. Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support , 2009, IEEE Transactions on Computers.

[4] Shou-Hsuan Stephen Huang,et al. Integrating Direct3D Programming into Computer Science Curriculum , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[5] Michael J. Liebelt,et al. Multiple-precision fixed-point vector multiply-accumulator using shared segmentation , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[6] Stuart F. Oberman,et al. Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[7] Ansi Ieee,et al. IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[8] Stamatis Vassiliadis,et al. Hard-Wired Multipliers with Encoded Partial Products , 1991, IEEE Trans. Computers.

[9] G. Goto,et al. A 54*54-b regularly structured tree multiplier , 1992 .