Reproducible floating-point atomic addition in data-parallel environment
暂无分享,去创建一个
[1] Ganesh Gopalakrishnan,et al. Determinism and Reproducibility in Large-Scale HPC Systems , 2013 .
[2] James Demmel,et al. Parallel Reproducible Summation , 2015, IEEE Transactions on Computers.
[3] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[4] John D. Owens,et al. Efficient Synchronization Primitives for GPUs , 2011, ArXiv.
[5] Ulrich W. Kulisch,et al. Comments on Fast and Exact Accumulation of Products , 2010, PARA.
[6] Wu-chun Feng,et al. To GPU synchronize or not GPU synchronize? , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[7] Stephane Cotin,et al. EP4A: Software and Computer Based Simulator Research: Development and Outlook SOFA—An Open Source Framework for Medical Simulation , 2007, MMVR.
[8] David Defour,et al. Impacting predictability of GPU's , 2014 .
[9] David Defour,et al. Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures , 2014 .
[10] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..
[11] James Demmel,et al. LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs , 2008 .
[12] Jason Sanders,et al. CUDA by example: an introduction to general purpose GPU programming , 2010 .
[13] Kevin Skadron,et al. Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.
[14] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[15] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[16] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.