New-Sum: A Novel Online ABFT Scheme For General Iterative Methods
暂无分享,去创建一个
Shuaiwen Song | Sriram Krishnamoorthy | Dingwen Tao | Zizhong Chen | Darren J. Kerbyson | Xin Liang | Panruo Wu | Eddy Z. Zhang | Panruo Wu | S. Krishnamoorthy | D. Kerbyson | Zizhong Chen | Xin Liang | Dingwen Tao | E. Zhang | S. Song
[1] Padma Raghavan,et al. Fault tolerant preconditioned conjugate gradient for sparse linear system solution , 2012, ICS '12.
[2] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.
[3] Babak Falsafi,et al. Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.
[4] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[5] Rakesh Kumar,et al. An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[6] Padma Raghavan,et al. Characterizing the impact of soft errors on iterative methods in scientific computing , 2011, ICS '11.
[7] Zizhong Chen,et al. FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines , 2014, HPDC '14.
[8] Franck Cappello,et al. Detecting and Correcting Data Corruption in Stencil Applications through Multivariate Interpolation , 2015, 2015 IEEE International Conference on Cluster Computing.
[9] Yves Robert,et al. Combining Backward and Forward Recovery to Cope with Silent Errors in Iterative Solvers , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[10] Richard W. Vuduc,et al. Self-stabilizing iterative solvers , 2013, ScalA '13.
[11] Shuaiwen Song,et al. Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[12] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[13] Robert E. Lyons,et al. The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..
[14] Sriram Krishnamoorthy,et al. Compiler-assisted detection of transient memory errors , 2014, PLDI.
[15] Kurt B. Ferreira,et al. Fault-tolerant linear solvers via selective reliability , 2012, ArXiv.
[16] Bronis R. de Supinski,et al. Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.
[17] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[18] Zizhong Chen,et al. Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods , 2013, PPoPP '13.
[19] Franck Cappello,et al. Adaptive Impact-Driven Detection of Silent Data Corruption for HPC Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.
[20] D. R. Fokkema,et al. BICGSTAB( L ) FOR LINEAR EQUATIONS INVOLVING UNSYMMETRIC MATRICES WITH COMPLEX , 1993 .
[21] Rakesh Kumar,et al. Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[22] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[23] Frank Mueller,et al. Evaluating the Impact of SDC on the GMRES Iterative Solver , 2013, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.