Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components

FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of individual arithmetic and linear algebra operations. In this paper the authors take a higher level approach and seek to reduce the intermediate computational precision on the algorithmic level by optimizing the accuracy towards the final result of an algorithm. In our case this is the accurate solution of partial differential equations (PDEs). Using the Poisson problem as a typical PDE example the authors show that most intermediate operations can be computed with floats or even smaller formats and only very few operations (e.g. 1%) must be performed in double precision to obtain the same accuracy as a full double precision solver. Thus the FPGA can be configured with many parallel float rather than few resource hungry double operations. To achieve this, the authors adapt the general concept of mixed precision iterative refinement methods to FPGAs and develop a fully pipelined version of the conjugate gradient solver. The authors combine this solver with different iterative refinement schemes and precision combinations to obtain resource efficient mappings of the pipelined algorithm core onto the FPGA

[1]  J. H. Wilkinson,et al.  Solution of real and complex systems of linear equations , 1966 .

[2]  Yvon Savaria,et al.  A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs , 2002, FPGA '02.

[3]  Peter M. Athanas,et al.  A scaleable FIR filter using 32-bit floating-point complex arithmetic on a configurable computing machine , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[4]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[5]  Keith O. Geddes,et al.  Exploiting fast hardware floating point in high precision computation , 2003, ISSAC '03.

[6]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[7]  Stefan Turek,et al.  Efficient Solvers for Incompressible Flow Problems - An Algorithmic and Computational Approach , 1999, Lecture Notes in Computational Science and Engineering.

[8]  R. Pavani,et al.  Parallel Numerical Linear Algebra , 1995, PDP.

[9]  Kathryn Turner,et al.  Efficient High Accuracy Solutions with GMRES(m) , 1992, SIAM J. Sci. Comput..

[10]  Jonathan E. Scalera,et al.  A Systolic FFT Architecture for Real Time FPGA Systems , 2005 .

[11]  James Demmel,et al.  Error bounds from extra-precise iterative refinement , 2006, TOMS.

[12]  Wayne Luk,et al.  Unifying bit-width optimisation for fixed-point and floating-point designs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[13]  Karl S. Hemmert,et al.  A comparison of floating point and logarithmic number systems for FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[14]  Gérard Meurant Multitasking the conjugate gradient method on the CRAY X-MP/48 , 1987, Parallel Comput..

[15]  Wayne Luk,et al.  Automating Customisation of Floating-Point Designs , 2002, FPL.

[16]  Tom Dillon An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs , 2004 .

[17]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[18]  Sadaf R. Alam,et al.  Scientific Computing Beyond CPUs: FPGA implementations of common scientific kernels , 2005 .

[19]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[20]  V. Eijkhout,et al.  Finite-choice algorithm optimization in Conjugate Gradients∗ , 2003 .

[21]  Robert Strzodka,et al.  Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[22]  Russell Tessier,et al.  Floating point unit generation and evaluation for FPGAs , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[23]  Gene H. Golub,et al.  A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations , 2007, Milestones in Matrix Computation.

[24]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[25]  Rob A. Rutenbar,et al.  Lightweight Floating-Point Arithmetic: Case Study of Inverse Discrete Cosine Transform , 2002, EURASIP J. Adv. Signal Process..

[26]  James Demmel,et al.  Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.

[27]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[28]  Pavle Belanovic,et al.  A Library of Parameterized Floating-Point Modules and Their Use , 2002, FPL.