Design Issues and Implementations for Floating-Point Divide–Add Fused

This brief presents a dedicated unit for the combined operation of floating-point (FP) division followed by addition/subtraction-the divide-add fused (DAF). The goal of this unit is to increase the performance and the accuracy of applications where this combined operation is frequent, such as the interval Newton's method or the polynomial approximation. The proposed DAF unit presents a similar architecture to the FP multiply-accumulate units. The main difference is represented by the divider, which is implemented using digit-recurrence algorithms. An important design tradeoff regarding DAF is represented by the number of required quotient bits. We present the impact of the adopted number of quotient bits on accuracy, cost, and performance. Consequently, two implementations are proposed: one pro-accuracy and one pro-performance. We show that the proposed implementations have better accuracy with respect to the solution based on two distinct units: an FP divider and an FP adder. The implementation suitable for lower latency presents the best cost-performance tradeoff.

[1]  Xiangku Li,et al.  Design of Low-Cost High-Performance Floating-Point Fused Multiply-Add with Reduced Power , 2010, 2010 23rd International Conference on VLSI Design.

[2]  Michael J. Flynn,et al.  Design Issues in Division and Other Floating-Point Operations , 1997, IEEE Trans. Computers.

[3]  Neil Burgess,et al.  Design of the ARM VFP11 Divide and Square Root Synthesisable Macrocell , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[4]  Tomás Lang,et al.  On-the-Fly Rounding , 1992, IEEE Trans. Computers.

[5]  Peter Kornerup Digit selection for SRT division and square root , 2005, IEEE Transactions on Computers.

[6]  Arnaud Tisserand,et al.  Computing machine-efficient polynomial approximations , 2006, TOMS.

[7]  Ulrich W. Kulisch,et al.  Hardware Support for Interval Arithmetic , 2006, Reliab. Comput..

[8]  Javier D. Bruguera,et al.  Floating-point fused multiply-add: reduced latency for floating-point addition , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[9]  A. Barr,et al.  Towards more efficient interval analysis: corner forms and a remainder interval newton method , 2005 .

[10]  John M. Snyder,et al.  Interval analysis for computer graphics , 1992, SIGGRAPH.

[11]  Miriam Leeser,et al.  Area and performance tradeoffs in floating-point divide and square-root implementations , 1996, CSUR.

[12]  Jon Hasselgren,et al.  PCU: the programmable culling unit , 2007, SIGGRAPH 2007.

[13]  Tomás Lang,et al.  Digit-recurrence dividers with reduced logical depth , 2005, IEEE Transactions on Computers.

[14]  Gregory B. Zyner,et al.  167 MHz radix-8 divide and square root using overlapped radix-2 stages , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[15]  César A. Muñoz,et al.  Verified Real Number Calculations: A Library for Interval Arithmetic , 2007, IEEE Transactions on Computers.

[16]  Zhaolin Li,et al.  Design of A Double-Precision Floating- Point Multiply-Add-Fused Unit with Consideration of Data Dependence , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[17]  Inmaculada García,et al.  Reliable algorithms for ray intersection in computer graphics based on interval arithmetic , 2003, 16th Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2003).

[18]  H.-F. Pabst,et al.  Ray Casting of Trimmed NURBS Surfaces on the GPU , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[19]  Peter-Michael Seidel,et al.  Delay-optimized implementation of IEEE floating-point addition , 2004, IEEE Transactions on Computers.