Improving the Performance of the Divide-Add Fused Operation Using Variable Latency Quotient Generation

Dedicated floating point units for divide-add fused operation (division followed by addition/subtraction) can be used to increase the performance of the interval Newton's method. The key issue regarding these units is represented by the number of quotient bits generated. A high number leads to better accuracy, but also to low performance. The required number of quotient bits is determined by the exponents' difference and the number of leading zeros. In this paper, we propose a divide-add fused unit which generates a variable number of quotient bits. This way, the latency for the cases when few quotient bits are needed is reduced, without loss in precision. Thus, the average performance of the divide-add fused operation is improved, while the area overhead is around 1%.

[1]  EvenGuy,et al.  Delay-Optimized Implementation of IEEE Floating-Point Addition , 2004 .

[2]  Ulrich W. Kulisch,et al.  Hardware Support for Interval Arithmetic , 2006, Reliab. Comput..

[3]  R. B. Kearfott,et al.  Interval Computations: Introduction, Uses, and Resources , 2000 .

[4]  Michael J. Flynn,et al.  Design Issues in Division and Other Floating-Point Operations , 1997, IEEE Trans. Computers.

[5]  Tomás Lang,et al.  Digit-recurrence dividers with reduced logical depth , 2005, IEEE Transactions on Computers.

[6]  Gregory B. Zyner,et al.  167 MHz radix-8 divide and square root using overlapped radix-2 stages , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[7]  Zhaolin Li,et al.  Design of A Double-Precision Floating- Point Multiply-Add-Fused Unit with Consideration of Data Dependence , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[8]  Miriam Leeser,et al.  Area and performance tradeoffs in floating-point divide and square-root implementations , 1996, CSUR.

[9]  Javier D. Bruguera,et al.  Floating-point fused multiply-add: reduced latency for floating-point addition , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[10]  Peter-Michael Seidel,et al.  Delay-optimized implementation of IEEE floating-point addition , 2004, IEEE Transactions on Computers.

[11]  Peter Kornerup Digit selection for SRT division and square root , 2005, IEEE Transactions on Computers.

[12]  Tomás Lang,et al.  On-the-Fly Rounding , 1992, IEEE Trans. Computers.

[13]  Neil Burgess,et al.  Design of the ARM VFP11 Divide and Square Root Synthesisable Macrocell , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).