A framework for evaluating comprehensive fault resilience mechanisms in numerical programs
暂无分享,去创建一个
Bin Li | Lu Peng | Greg Bronevetsky | Marc Casas | Sui Chen
[1] Nathan DeBardeleben,et al. Extra Bits on SRAM and DRAM Errors - More Data from the Field. , 2014 .
[2] Engin Ipek,et al. Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[3] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[4] Martin Schulz,et al. Fault resilience of the algebraic multi-grid solver , 2012, ICS '12.
[5] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[6] Jinsuk Chung,et al. Containment domains: A scalable, efficient, and flexible resilience scheme for exascale systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Jack J. Dongarra,et al. High Performance Dense Linear System Solver with Soft Error Resilience , 2011, 2011 IEEE International Conference on Cluster Computing.
[9] Xin Li,et al. A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility , 2010, USENIX Annual Technical Conference.
[10] Hua Li,et al. Thermally-induced soft errors in nanoscale CMOS circuits , 2007, 2007 IEEE International Symposium on Nanoscale Architectures.
[11] Jack J. Dongarra,et al. High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors , 2012, ICCS.
[12] R.C. Baumann,et al. Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.
[13] Sarita V. Adve,et al. Accurate microarchitecture-level fault modeling for studying hardware faults , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[14] Pedro J. Gil,et al. Fault Injection into VHDL Models: Experimental Validation of a Fault Tolerant Microcomputer System , 1999, EDCC.
[15] Rakesh Kumar,et al. A numerical optimization-based methodology for application robustification: Transforming applications for error tolerance , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[16] Bronis R. de Supinski,et al. Soft error vulnerability of iterative linear algebra methods , 2007, ICS '08.
[17] Jinsuk Chung,et al. Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems , 2012, HiPC 2012.
[18] N. Hengartner,et al. Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer , 2005, IEEE Transactions on Device and Materials Reliability.
[19] M. L. Alles,et al. Technology scaling and soft error reliability , 2012, 2012 IEEE International Reliability Physics Symposium (IRPS).
[20] Rakesh Kumar,et al. Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[21] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[22] James H. Laros,et al. Evaluating the viability of process replication reliability for exascale systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[23] Charng-Da Lu,et al. Assessing Fault Sensitivity in MPI Applications , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[24] Shubhendu S. Mukherjee,et al. Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[25] Matthew Wrobel. DRC ( Digital Room Correction ) , 2011 .
[26] Ziming Zhang,et al. Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience , 2011, Euro-Par Workshops.