Accuracy, cost, and performance tradeoffs for floating-point accumulation

Set-wise floating point accumulation is a fundamental operation in scientific computing, but it presents design challenges such as data hazard between the output and input of the deeply pipelined floating point adder and numerical accuracy of results. Streaming reduction architectures on FPGAs generally do not consider the floating point error, which can become a significant factor due to the dynamic nature of reduction architectures and the inherent roundoff error and non-associativity of floating-point addition. In this paper we two frameworks using our existing reduction circuit architecture based on compensated summation for improving accuracy of results. We find that both these implementations provide almost 50 % exact results for most of the datasets and relative error is less than that for the reduction circuit. These designs require more than twice the resources and operate at less frequency when compared to the original reduction circuit.

[1]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[2]  Jason D. Bakos,et al.  A high-performance double precision accumulator , 2009, 2009 International Conference on Field-Programmable Technology.

[3]  Mi Lu,et al.  Group-Alignment based Accurate Floating-Point Summation on FPGAs , 2006, ERSA.

[4]  Nachiket Kapre,et al.  Optimistic Parallelization of Floating-Point Accumulation , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[5]  William Kahan,et al.  A Survey of Error Analysis , 1971, IFIP Congress.

[6]  Ivo Babuska Numerical stability in mathematical analysis , 1968, IFIP Congress.

[7]  André B. J. Kokkeler,et al.  Streaming Reduction Circuit , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[8]  Jason D. Bakos,et al.  A Sparse Matrix Personality for the Convey HC-1 , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[9]  Viktor K. Prasanna,et al.  High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[10]  James Gregory A comparison of floating point summation methods , 1972, CACM.

[11]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[12]  Miaoqing Huang,et al.  Modular Design of Fully Pipelined Reduction Circuits on FPGAs , 2013, IEEE Transactions on Parallel and Distributed Systems.

[13]  George A. Constantinides,et al.  Accurate Floating Point Arithmetic through Hardware Error-Free Transformations , 2011, ARC.

[14]  T. J. Dekker,et al.  A floating-point technique for extending the available precision , 1971 .

[15]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[16]  Vincent Lefèvre,et al.  On the Computation of Correctly-Rounded Sums , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[17]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[18]  Jonathan Richard Shewchuk,et al.  Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates , 1997, Discret. Comput. Geom..