Filtering failure logs for a BlueGene/L prototype
暂无分享,去创建一个
Anand Sivasubramaniam | José E. Moreira | Yanyong Zhang | Ramendra K. Sahoo | Manish Gupta | Yinglung Liang | A. Sivasubramaniam | R. Sahoo | Manish Gupta | J. Moreira | Yanyong Zhang | Yinglung Liang
[1] Luiz C. Alves,et al. Reliability, availability, and serviceability (RAS) of the IBM eServer z990 , 2004, IBM J. Res. Dev..
[2] Mark S. Squillante,et al. Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.
[3] Daniel P. Siewiorek,et al. A comparative analysis of event tupling schemes , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[4] José E. Moreira,et al. Job Scheduling for the BlueGene/L System , 2002, JSSPP.
[5] Erik Riedel,et al. More Than an Interface - SCSI vs. ATA , 2003, FAST.
[6] James F. Ziegler,et al. Terrestrial cosmic rays , 1996, IBM J. Res. Dev..
[7] Ronald Minnich,et al. A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.
[8] Margaret Martonosi,et al. Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[9] Daniel P. Siewiorek,et al. VAX/VMS event monitoring and analysis , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[10] Anand Sivasubramaniam,et al. A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[11] Lorenzo Alvisi,et al. Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.
[12] Sarita V. Adve,et al. The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.
[13] Ravishankar K. Iyer,et al. Analysis and Modeling of Correlated Failures in Multicomputer Systems , 1992, IEEE Trans. Computers.
[14] Todd M. Austin,et al. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.
[15] Kevin Skadron,et al. Temperature-aware microarchitecture , 2003, ISCA '03.
[16] Ravishankar K. Iyer,et al. Impact of Correlated Failures on Dependability in a VAXcluster System , 1992 .
[17] Daniel P. Siewiorek,et al. Error log analysis: statistical modeling and heuristic trend analysis , 1990 .
[18] Ravishankar K. Iyer,et al. Networked Windows NT system field failure data analysis , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.
[19] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[20] Ravishankar K. Iyer,et al. Failure analysis and modeling of a VAXcluster system , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.
[21] Kishor S. Trivedi,et al. Analysis and implementation of software rejuvenation in cluster systems , 2001, SIGMETRICS '01.