Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance
暂无分享,去创建一个
[1] Franklin T. Luk,et al. A Linear Algebraic Model of Algorithm-Based Fault Tolerance , 1988, IEEE Trans. Computers.
[2] Franck Cappello,et al. Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..
[3] Zizhong Chen,et al. Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods , 2013, PPoPP '13.
[4] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[5] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[6] Hui Liu,et al. High performance linpack benchmark: a fault tolerant implementation without checkpointing , 2011, ICS '11.
[7] Bronis R. de Supinski,et al. Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.
[8] Zizhong Chen,et al. Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[9] Jack J. Dongarra,et al. High Performance Dense Linear System Solver with Soft Error Resilience , 2011, 2011 IEEE International Conference on Cluster Computing.
[10] Padma Raghavan,et al. Fault tolerant preconditioned conjugate gradient for sparse linear system solution , 2012, ICS '12.
[11] Robert E. Lyons,et al. The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..
[12] Zizhong Chen,et al. Correcting soft errors online in LU factorization , 2013, HPDC '13.
[13] Zizhong Chen,et al. Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing , 2009, IEEE Transactions on Computers.
[14] Zizhong Chen,et al. Algorithm-Based Fault Tolerance for Fail-Stop Failures , 2008, IEEE Transactions on Parallel and Distributed Systems.
[15] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..
[16] Julien Langou,et al. The Problem With the Linpack Benchmark 1.0 Matrix Generator , 2009, Int. J. High Perform. Comput. Appl..
[17] Mei Han An,et al. accuracy and stability of numerical algorithms , 1991 .
[18] Hao Ling,et al. Performance evaluation of moment‐method codes on an intel iPSC/860 hypercube computer , 1993 .
[19] Zizhong Chen,et al. Algorithmic Cholesky factorization fault recovery , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[20] Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing , 2011, HPDC '11.
[21] Jack J. Dongarra,et al. High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors , 2012, ICCS.
[22] Mahmut T. Kandemir,et al. Analyzing the soft error resilience of linear solvers on multicore multiprocessors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[23] James Demmel,et al. Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout , 2013, SPAA.
[24] Franklin T. Luk,et al. An Analysis of Algorithm-Based Fault Tolerance Techniques , 1988, J. Parallel Distributed Comput..
[25] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[26] Martin Schulz,et al. Fault resilience of the algebraic multi-grid solver , 2012, ICS '12.
[27] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[28] Thomas Hérault,et al. Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.
[29] Rui Wang,et al. Building algorithmically nonstop fault tolerant MPI programs , 2011, 2011 18th International Conference on High Performance Computing.
[30] Padma Raghavan,et al. Characterizing the impact of soft errors on iterative methods in scientific computing , 2011, ICS '11.
[31] Rui Wang,et al. A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.