Research on the Accuracy of Single Precision on Graphics Processing Unit

The single precision in the computer is composed of two parts: the mantissa and the exponent. which are expressed by the limited binary bits. During adding on the single precision, the smaller one should shift to line up the decimal points, If the mantissa of the smaller one exceeds the range of registers, truncating or rounding off will be executed and cause losing precision. As far as the serials of GTX200 is concerned, the length of the register storing intermediate results is the same as the one for the final results. The accuracy problem is very prominent while the length of the mantissa exceeds the range of register after lining up the decimal points during the single precision adding. In this paper, we use the partial sum algorithm to improve the accuracy of single precision adding, and verify the correctness of the algorithm from the perspective of experiment by means of the matrix multiplication. Finally, we analyze the effect of partial sum algorithm on compute peak of the GPU and come to the conclusion that the partial sum algorithm has little influence on the compute peak of the GPU.

[1]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[2]  Nathan A. Carr,et al.  Cache and bandwidth aware matrix multiplication on the GPU , 2010 .

[3]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[4]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[5]  Jack M. Wolfe Reducing truncation errors by programming , 1964, CACM.

[6]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[7]  Jeffrey S. Vetter,et al.  Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study , 2009, Parallel Comput..

[8]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[9]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[10]  GoldbergDavid,et al.  "What Every Computer Scientist Should Know About Floating-Point Arithmetic" , 1991, ACM Comput. Surv..

[11]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Zhou Yong Software/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture , 2008 .

[13]  David Goldberg,et al.  What every computer scientist should know about floating-point arithmetic , 1991, CSUR.

[14]  Shuai Zhang,et al.  Software/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture: Software/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture , 2009 .

[15]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[16]  Robert Strzodka,et al.  Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[17]  Dinesh Manocha,et al.  Memory - A memory model for scientific algorithms on graphics processors , 2006, SC.

[18]  Peter Schröder,et al.  Quantum Monte Carlo on graphical processing units , 2007, Comput. Phys. Commun..

[19]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[20]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.

[21]  Satoshi Matsuoka,et al.  Software-Based ECC for GPUs , 2011 .