Extending Summation Precision for Network Reduction Operations
暂无分享,去创建一个
[1] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[2] Jack Dongarra,et al. HPCG Benchmark Technical Specification , 2013 .
[3] Darius Buntinas,et al. A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE , 2011, EuroMPI.
[4] Miron Livny,et al. Data placement for scientific applications in distributed environments , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.
[5] Hubert Ritzdorf,et al. Collective operations in NEC's high-performance MPI libraries , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[6] James Demmel,et al. Accurate and efficient expression evaluation and linear algebra , 2008, Acta Numerica.
[7] David H. Bailey,et al. High-precision floating-point arithmetic in scientific computation , 2004, Computing in Science & Engineering.
[8] Ulrich W. Kulisch,et al. Very fast and exact accumulation of products , 2011, Computing.
[9] George Varghese,et al. A 22nm IA multi-CPU and GPU System-on-Chip , 2012, 2012 IEEE International Solid-State Circuits Conference.
[10] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .
[11] Amith R. Mamidala,et al. Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).
[12] Ulrich W. Kulisch,et al. The exact dot product as basic tool for long interval arithmetic , 2011, Computing.
[13] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[14] Wu-chun Feng,et al. The Quadrics network (QsNet): high-performance clustering technology , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.
[15] John Michael McNamee,et al. A comparison of methods for accurate summation , 2004, SIGS.
[16] David S. Gilliam,et al. The impact of finite precision arithmetic and sensitivity on the numerical solution of partial differential equations , 2002 .
[17] Randy H. Katz,et al. Contemporary Logic Design , 2004 .
[18] Wayne B. Hayes,et al. Algorithm 908 , 2010 .
[19] Chris H. Q. Ding,et al. Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications , 2000, ICS '00.
[20] Abhinav Vishnu,et al. Evaluating the Potential of Cray Gemini Interconnect for PGAS Communication Runtime Systems , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.
[21] Vincent Lefèvre,et al. MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.
[22] Xiaoye S. Li,et al. ARPREC: An arbitrary precision computation package , 2002 .
[23] Jürgen Wolff von Gudenberg,et al. A long accumulator like a carry-save adder , 2011, Computing.
[24] Nicholas J. Higham,et al. The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..
[25] Jonathan M. Borwein,et al. High-precision computation: Mathematical physics and dynamics , 2010, Appl. Math. Comput..
[26] Jesper Larsson Träff,et al. SKaMPI: a comprehensive benchmark for public benchmarking of MPI , 2002, Sci. Program..
[27] Jeffrey T. Draper,et al. Design trade-offs in floating-point unit implementation for embedded and processing-in-memory systems , 2005, 2005 IEEE International Symposium on Circuits and Systems.
[28] Sandia Report,et al. HPCG Technical Specification , 2013 .
[29] Samuel Williams,et al. Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[30] Torsten Hoefler,et al. Sparse collective operations for MPI , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[31] Torsten Hoefler,et al. Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes , 2010, EuroMPI.
[32] Dhabaleswar K. Panda,et al. NIC-based reduction algorithms for large-scale clusters , 2006, Int. J. High Perform. Comput. Netw..
[33] Ansi Ieee,et al. IEEE Standard for Binary Floating Point Arithmetic , 1985 .
[34] Henri E. Bal,et al. MPI's Reduction Operations in Clustered Wide Area Systems. , 1999 .
[35] Greg Astfalk,et al. Why optical data communications and why now? , 2009 .
[36] Dan Tsafrir,et al. The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) , 2007, ExpCS '07.
[37] James Demmel,et al. Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[38] Stef Graillat,et al. Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic , 2012, Inf. Comput..
[39] James Demmel,et al. On the Complexity of Computing Error Bounds , 2001, Found. Comput. Math..
[40] John Shalf,et al. HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems , 2014 .
[41] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[42] Viktor K. Prasanna,et al. Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[43] Robert L. Smith,et al. An American National Standard- IEEE Standard for Binary Floating-Point Arithmetic , 1985 .
[44] Xia Hong,et al. Analysis and Research of Floating-Point Exceptions , 2010, The 2nd International Conference on Information Science and Engineering.
[45] Michael J. Schulte,et al. Integer Multiplication with Overflow Detection or Saturation , 2000, IEEE Trans. Computers.
[46] Vincent Lefèvre,et al. Why and How to Use Arbitrary Precision , 2010, Comput. Sci. Eng..