Assessing time coalescence techniques for the analysis of supercomputer logs
暂无分享,去创建一个
[1] Daniel P. Siewiorek,et al. A comparative analysis of event tupling schemes , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[2] William H. Sanders,et al. The Möbius Framework and Its Implementation , 2002, IEEE Trans. Software Eng..
[3] Ravishankar K. Iyer,et al. Analysis and Modeling of Correlated Failures in Multicomputer Systems , 1992, IEEE Trans. Computers.
[4] Risto Vaarandi,et al. Mining event logs with SLCT and LogHound , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.
[5] Navjot Singh,et al. A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[6] Jon Stearley,et al. What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[7] Anand Sivasubramaniam,et al. Filtering failure logs for a BlueGene/L prototype , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[8] Cheng-Zhong Xu,et al. Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[9] Ravishankar K. Iyer,et al. Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.
[10] Xin Li,et al. A Memory Soft Error Measurement on Production Systems , 2007, USENIX Annual Technical Conference.
[11] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..
[12] Domenico Cotroneo,et al. A framework for assessing the dependability of supercomputers via automated log analysis , 2008 .
[13] Bianca Schroeder,et al. Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? , 2007, TOS.
[14] Daniel P. Siewiorek,et al. Error log analysis: statistical modeling and heuristic trend analysis , 1990 .
[15] Ravishankar K. Iyer,et al. Software Dependability in the Tandem GUARDIAN System , 1995, IEEE Trans. Software Eng..
[16] Francesco Palmieri,et al. A Fault Avoidance Strategy Improving the Reliability of the EGI Production Grid Infrastructure , 2010, OPODIS.
[17] Ravishankar K. Iyer,et al. Networked Windows NT system field failure data analysis , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.
[18] Jon Stearley,et al. Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[19] Richard P. Martin,et al. Improving cluster availability using workstation validation , 2002, SIGMETRICS '02.
[20] Narayan Desai,et al. Co-analysis of RAS Log and Job Log on Blue Gene/P , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[21] Christopher D. Carothers,et al. An analysis of clustered failures on large supercomputing systems , 2009, J. Parallel Distributed Comput..
[22] Richard Wolski,et al. Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..
[23] Domenico Cotroneo,et al. Improving Log-based Field Failure Data Analysis of multi-node computing systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).
[24] Daniel P. Siewiorek,et al. Workload, Performance, and Reliability of Digital Computing Systems. , 1980 .
[25] Archana Ganapathi,et al. Crash data collection: a Windows case study , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[26] Ravishankar K. Iyer,et al. Analyze-NOW-an environment for collection and analysis of failures in a network of workstations , 1996, IEEE Trans. Reliab..
[27] Gwan S. Choi,et al. Error and failure analysis of a UNIX server , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).
[28] Daniel P. Siewiorek,et al. Models for time coalescence in event logs , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[29] Francesco Palmieri,et al. Towards a federated Metropolitan Area Grid environment: The SCoPE network-aware infrastructure , 2010, Future Gener. Comput. Syst..
[30] Mark S. Squillante,et al. Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.
[31] Daniel P. Siewiorek,et al. WORKLOAD, PERFORMANCE, AND RELlABlLlTY OF DIGITAL COMPUTlNG SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..
[32] Mohamed Kaâniche,et al. Event log based dependability analysis of Windows NT and 2K systems , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..
[33] Thomas J. Hacker,et al. Using queue structures to improve job reliability , 2007, HPDC '07.
[34] Anand Sivasubramaniam,et al. BlueGene/L Failure Analysis and Prediction Models , 2006, International Conference on Dependable Systems and Networks (DSN'06).
[35] Zhiling Lan,et al. System log pre-processing to improve failure prediction , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.