PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs
暂无分享,去创建一个
[1] Slo-Li Chu,et al. OpenCL: Make Ubiquitous Supercomputing Possible , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).
[2] Eduardo Pinheiro,et al. DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.
[3] Eric Roman. A Survey of Checkpoint / Restart Implementations , 2002 .
[4] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[5] Huiyang Zhou,et al. Understanding software approaches for GPGPU reliability , 2009, GPGPU-2.
[6] Elena Dubrova,et al. Fault Tolerant Design : An Introduction , 2013 .
[7] Jens H. Krüger,et al. GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.
[8] Satoshi Matsuoka,et al. A high-performance fault-tolerant software framework for memory on commodity GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[9] David I. August,et al. SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.
[10] C. V. Ramamoorthy,et al. Rollback and Recovery Strategies for Computer Programs , 1972, IEEE Transactions on Computers.
[11] Sudhanva Gurumurthi,et al. Towards Transient Fault Tolerance for Heterogeneous Computing Platforms , 2008 .
[12] Kevin Skadron,et al. A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors , 2007, GH '07.
[13] Axel W. Krings,et al. Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing , 2009, IEEE Transactions on Dependable and Secure Computing.
[14] Satoshi Matsuoka,et al. Software-Based ECC for GPUs , 2011 .
[15] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[16] L. Borucki,et al. Comparison of accelerated DRAM soft error rates measured at component and system level , 2008, 2008 IEEE International Reliability Physics Symposium.
[17] Ravishankar K. Iyer,et al. Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[18] Niraj K. Jha,et al. Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.
[19] Joel S. Emer,et al. The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.
[20] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[21] Massimo Violante,et al. Software-Implemented Hardware Fault Tolerance , 2010 .