Assessing Fault Sensitivity in MPI Applications
暂无分享,去创建一个
[1] Cristian Constantinescu,et al. Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.
[2] James F. Ziegler,et al. Terrestrial cosmic rays , 1996, IBM J. Res. Dev..
[3] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[4] Ronald Minnich,et al. A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.
[5] SkjellumAnthony,et al. A high-performance, portable implementation of the MPI message passing interface standard , 1996 .
[6] Craig Partridge,et al. When the CRC and TCP checksum disagree , 2000, SIGCOMM.
[7] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[8] Manuel Blum,et al. Software reliability via run-time result-checking , 1997, JACM.
[9] Cristian Constantinescu,et al. Teraflops Supercomputer: Architecture and Validation of the Fault Tolerance Mechanisms , 2000, IEEE Trans. Computers.
[10] A. Winsor. Sampling techniques. , 2000, Nursing times.
[11] Daniel P. Siewiorek,et al. Error log analysis: statistical modeling and heuristic trend analysis , 1990 .
[12] Gérard M. Baudet,et al. Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.
[13] Christian Engelmann,et al. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .
[14] Intel Corportation,et al. IA-32 Intel Architecture Software Developers Manual , 2004 .
[15] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[16] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[17] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[18] Timothy J. Dell,et al. A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .
[19] Laxmikant V. Kale,et al. NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .
[20] Henrique Madeira,et al. Assessing the effects of communication faults on parallel applications , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.
[21] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[22] Henrique Madeira,et al. Experimental assessment of parallel systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[23] Ronald Minnich,et al. A network-failure-tolerant message-passing system for terascale clusters , 2002, ICS '02.
[24] Edward J. McCluskey,et al. Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..
[25] Nicholas Nethercote,et al. Valgrind: A Program Supervision Framework , 2003, RV@CAV.
[26] Irith Pomeranz,et al. Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[27] G. Allen,et al. The Cactus Code: a problem solving environment for the grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.
[28] Alan Bundy,et al. Constructing Induction Rules for Deductive Synthesis Proofs , 2006, CLASE.
[29] P. L. Springer. Analysis of application behavior during fault injection , 2001 .
[30] Greg Burns,et al. LAM: An Open Cluster Environment for MPI , 2002 .