Algorithm Level Fault Tolerance for Molecular Dynamic Applications
暂无分享,去创建一个
[1] Heather M. Quinn,et al. Terrestrial-based radiation upsets: a cautionary tale , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).
[2] Franklin T. Luk,et al. A Linear Algebraic Model of Algorithm-Based Fault Tolerance , 1988, IEEE Trans. Computers.
[3] Kurt B. Ferreira,et al. Fault-tolerant iterative methods via selective reliability. , 2011 .
[4] Rakesh Kumar,et al. Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[5] Timothy J. Dell,et al. A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .
[6] Suku Nair,et al. Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor , 1990, IEEE Trans. Computers.
[7] Dong Li,et al. Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Frank Mueller,et al. Evaluating the Impact of SDC on the GMRES Iterative Solver , 2013, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[9] Gerald M. Masson,et al. Checking the integrity of trees , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[10] Zizhong Chen,et al. Correcting soft errors online in LU factorization , 2013, HPDC '13.
[11] Hui Liu,et al. High performance linpack benchmark: a fault tolerant implementation without checkpointing , 2011, ICS '11.
[12] Padma Raghavan,et al. Fault tolerant preconditioned conjugate gradient for sparse linear system solution , 2012, ICS '12.
[13] C. J. van Rijsbergen,et al. Information Retrieval , 1979, Encyclopedia of GIS.
[14] Franklin T. Luk,et al. An Analysis of Algorithm-Based Fault Tolerance Techniques , 1988, J. Parallel Distributed Comput..
[15] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[16] International Conference for High Performance Computing, Networking, Storage and Analysis, SC'13, Denver, CO, USA - November 17 - 21, 2013 , 2013, SC.
[17] Martin Schulz,et al. Fault resilience of the algebraic multi-grid solver , 2012, ICS '12.
[18] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[19] Edward J. McCluskey,et al. Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..
[20] Rakesh Kumar,et al. An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[21] Jack J. Dongarra,et al. Soft error resilient QR factorization for hybrid system with GPGPU , 2013, J. Comput. Sci..
[22] T. M. Mak,et al. Do we need anything more than single bit error correction (ECC)? , 2004, Records of the 2004 International Workshop on Memory Technology, Design and Testing, 2004..
[23] Eduardo Pinheiro,et al. DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.
[24] Hui Liu,et al. Matrix Multiplication on GPUs with On-Line Fault Tolerance , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.
[25] Jack J. Dongarra,et al. Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing , 1997, J. Parallel Distributed Comput..
[26] Hui Liu,et al. Algorithm-Based Recovery for Newton's Method without Checkpointing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[27] Shubhendu S. Mukherjee,et al. Perturbation-based Fault Screening , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[28] Mahmut T. Kandemir,et al. A data-centric approach to checksum reuse for array-intensive applications , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[29] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[30] Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing , 2011, HPDC '11.
[31] Sarita V. Adve,et al. Low-cost program-level detectors for reducing silent data corruptions , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[32] Sanjay J. Patel,et al. ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[33] Zizhong Chen. Extending algorithm-based fault tolerance to tolerate fail-stop failures in high performance distributed environments , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[34] Padma Raghavan,et al. Characterizing the impact of soft errors on iterative methods in scientific computing , 2011, ICS '11.
[35] Xin Li,et al. A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility , 2010, USENIX Annual Technical Conference.
[36] David Fiala. Detection and correction of silent data corruption for large-scale high-performance computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Thomas Hérault,et al. Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.
[38] Zizhong Chen,et al. Fail-Stop Failure Algorithm-Based Fault Tolerance for Cholesky Decomposition , 2015, IEEE Transactions on Parallel and Distributed Systems.
[39] Gagan Agrawal,et al. DISC: A Domain-Interaction Based Programming Model with Support for Heterogeneous Execution , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Zizhong Chen,et al. FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines , 2014, HPDC '14.