Combining Backward and Forward Recovery to Cope with Silent Errors in Iterative Solvers
暂无分享,去创建一个
Yves Robert | Bora Uçar | Massimiliano Fasi | B. Uçar | Y. Robert | M. Fasi
[1] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[2] Zizhong Chen,et al. Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods , 2013, PPoPP '13.
[3] Richard W. Vuduc,et al. Self-stabilizing iterative solvers , 2013, ScalA '13.
[4] Nicholas J. Higham,et al. Functions of matrices - theory and computation , 2008 .
[5] Bianca Schroeder,et al. Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design , 2012, ASPLOS XVII.
[6] Yves Robert,et al. Checkpointing algorithms and fault prediction , 2014, J. Parallel Distributed Comput..
[7] Laplacian Matrix , 2017, Encyclopedia of Machine Learning and Data Mining.
[8] Austin R. Benson,et al. Silent error detection in numerical time-stepping schemes , 2015, Int. J. High Perform. Comput. Appl..
[9] Yves Robert,et al. {Combining Algorithm-based Fault Tolerance and Checkpointing for Iterative Solvers} , 2015 .
[10] Kurt B. Ferreira,et al. Fault-tolerant iterative methods via selective reliability. , 2011 .
[11] F. Mueller,et al. Quantifying the Impact of Single Bit Flips on Floating Point Arithmetic , 2013 .
[12] Rolf Riesen,et al. Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing , 2012, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[13] Kurt B. Ferreira,et al. Fault-tolerant linear solvers via selective reliability , 2012, ArXiv.
[14] DongarraJack,et al. Algorithm-based fault tolerance for dense matrix factorizations , 2012 .
[15] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[16] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .
[17] Robert E. Lyons,et al. The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..
[18] Andrew A. Chien,et al. When is multi-version checkpointing needed? , 2013, FTXS '13.
[19] Yves Robert,et al. Assessing General-Purpose Algorithms to Cope with Fail-Stop and Silent Errors , 2016, TOPC.
[20] Rakesh Kumar,et al. Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[21] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[22] George Bosilca,et al. Algorithm-based fault tolerance applied to high performance computing , 2009, J. Parallel Distributed Comput..
[23] Henri Casanova,et al. Checkpointing strategies for parallel jobs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[24] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Christian Engelmann,et al. Combining Partial Redundancy and Checkpointing for HPC , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.
[26] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[27] Thomas Hérault,et al. Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.
[28] Bronis R. de Supinski,et al. Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.
[29] Franklin T. Luk,et al. A Linear Algebraic Model of Algorithm-Based Fault Tolerance , 1988, IEEE Trans. Computers.
[30] Padma Raghavan,et al. Fault tolerant preconditioned conjugate gradient for sparse linear system solution , 2012, ICS '12.
[31] Bora Uçar,et al. On analysis of partitioning models and metrics in parallel sparse matrix-vector multiplication , 2013 .
[32] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[33] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.