Extending Summation Precision for Network Reduction Operations

Double precision summation is at the core of numerous important algorithms such as Newton-Krylov methods and other operations involving inner products, but the effectiveness of summation is limited by the accumulation of rounding errors, which are an increasing problem with the scaling of modern HPC systems and data sets. To reduce the impact of precision loss, researchers have proposed increased- and arbitrary-precision libraries that provide reproducible error or even bounded error accumulation for large sums, but do not guarantee an exact result. Such libraries can also increase computation time significantly. We propose big integer (BigInt) expansions of double precision variables that enable arbitrarily large summations without error and provide exact and reproducible results. This is feasible with performance comparable to that of double-precision floating point summation, by the inclusion of simple and inexpensive logic into modern NICs to accelerate performance on large-scale systems.

[1]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[2]  Jack Dongarra,et al.  HPCG Benchmark Technical Specification , 2013 .

[3]  Darius Buntinas,et al.  A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE , 2011, EuroMPI.

[4]  Miron Livny,et al.  Data placement for scientific applications in distributed environments , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[5]  Hubert Ritzdorf,et al.  Collective operations in NEC's high-performance MPI libraries , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  James Demmel,et al.  Accurate and efficient expression evaluation and linear algebra , 2008, Acta Numerica.

[7]  David H. Bailey,et al.  High-precision floating-point arithmetic in scientific computation , 2004, Computing in Science & Engineering.

[8]  Ulrich W. Kulisch,et al.  Very fast and exact accumulation of products , 2011, Computing.

[9]  George Varghese,et al.  A 22nm IA multi-CPU and GPU System-on-Chip , 2012, 2012 IEEE International Solid-State Circuits Conference.

[10]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[11]  Amith R. Mamidala,et al.  Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[12]  Ulrich W. Kulisch,et al.  The exact dot product as basic tool for long interval arithmetic , 2011, Computing.

[13]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[14]  Wu-chun Feng,et al.  The Quadrics network (QsNet): high-performance clustering technology , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[15]  John Michael McNamee,et al.  A comparison of methods for accurate summation , 2004, SIGS.

[16]  David S. Gilliam,et al.  The impact of finite precision arithmetic and sensitivity on the numerical solution of partial differential equations , 2002 .

[17]  Randy H. Katz,et al.  Contemporary Logic Design , 2004 .

[18]  Wayne B. Hayes,et al.  Algorithm 908 , 2010 .

[19]  Chris H. Q. Ding,et al.  Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications , 2000, ICS '00.

[20]  Abhinav Vishnu,et al.  Evaluating the Potential of Cray Gemini Interconnect for PGAS Communication Runtime Systems , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[21]  Vincent Lefèvre,et al.  MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[22]  Xiaoye S. Li,et al.  ARPREC: An arbitrary precision computation package , 2002 .

[23]  Jürgen Wolff von Gudenberg,et al.  A long accumulator like a carry-save adder , 2011, Computing.

[24]  Nicholas J. Higham,et al.  The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[25]  Jonathan M. Borwein,et al.  High-precision computation: Mathematical physics and dynamics , 2010, Appl. Math. Comput..

[26]  Jesper Larsson Träff,et al.  SKaMPI: a comprehensive benchmark for public benchmarking of MPI , 2002, Sci. Program..

[27]  Jeffrey T. Draper,et al.  Design trade-offs in floating-point unit implementation for embedded and processing-in-memory systems , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[28]  Sandia Report,et al.  HPCG Technical Specification , 2013 .

[29]  Samuel Williams,et al.  Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Torsten Hoefler,et al.  Sparse collective operations for MPI , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[31]  Torsten Hoefler,et al.  Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes , 2010, EuroMPI.

[32]  Dhabaleswar K. Panda,et al.  NIC-based reduction algorithms for large-scale clusters , 2006, Int. J. High Perform. Comput. Netw..

[33]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[34]  Henri E. Bal,et al.  MPI's Reduction Operations in Clustered Wide Area Systems. , 1999 .

[35]  Greg Astfalk,et al.  Why optical data communications and why now? , 2009 .

[36]  Dan Tsafrir,et al.  The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) , 2007, ExpCS '07.

[37]  James Demmel,et al.  Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.

[38]  Stef Graillat,et al.  Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic , 2012, Inf. Comput..

[39]  James Demmel,et al.  On the Complexity of Computing Error Bounds , 2001, Found. Comput. Math..

[40]  John Shalf,et al.  HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems , 2014 .

[41]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[42]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[43]  Robert L. Smith,et al.  An American National Standard- IEEE Standard for Binary Floating-Point Arithmetic , 1985 .

[44]  Xia Hong,et al.  Analysis and Research of Floating-Point Exceptions , 2010, The 2nd International Conference on Information Science and Engineering.

[45]  Michael J. Schulte,et al.  Integer Multiplication with Overflow Detection or Saturation , 2000, IEEE Trans. Computers.

[46]  Vincent Lefèvre,et al.  Why and How to Use Arbitrary Precision , 2010, Comput. Sci. Eng..