论文信息 - A floating-point fused dot-product unit

A floating-point fused dot-product unit

A floating-point fused dot-product unit is presented that performs single-precision floating-point multiplication and addition operations on two pairs of data in a time that is only 150% the time required for a conventional floating-point multiplication. When placed and routed in a 45 nm process, the fused dot-product unit occupied about 70% of the area needed to implement a parallel dot-product unit using conventional floating-point adders and multipliers. The speed of the fused dot-product is 27% faster than the speed of the conventional parallel approach. The numerical result of the fused unit is more accurate because one rounding operation is needed versus at least three for other approaches.

Earl E. Swartzlander | Hani H. Saleh | E. Swartzlander | H. Saleh

[1] Sylvie Boldo,et al. Theorems on efficient argument reductions , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[2] Arch D. Robison. N-bit unsigned division via n-bit multiply-add , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[3] Keshab K. Parhi,et al. Order-configurable programmable power-efficient FIR filters , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[4] IEEE standard for binary floating-point arithmetic - IEEE standard 754-1985 , 1985 .

[5] Steven W. White,et al. POWER3: The next generation of PowerPC processors , 2000, IBM J. Res. Dev..

[6] Daisuke Takahashi. A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7] Paul Michael Farmwald,et al. On the design of high performance digital arithmetic units , 1981 .

[8] C.N. Hinds,et al. An enhanced floating point coprocessor for embedded signal processing and graphics applications , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[9] John Harrison,et al. Scientific Computing on the Itanium ™ Processor , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[10] Miriam Leeser,et al. Precision Modeling and Bit-width Optimization of Floating-Point Applications , 2003 .

[11] Peter W. Cook,et al. Second-generation RISC floating point with multiply-add fused , 1990 .

[12] Hewlett-Packard. THE HP PA-8000 RISC CPU , 2022 .