One Size Does Not Fit All: Clustering Supercomputer Failures Using a Multiple Time Window Approach
暂无分享,去创建一个
[1] Jesper Larsson Träff,et al. Euro-Par 2010 Parallel Processing Workshops - HeteroPar, HPCC, HiBB, CoreGrid, UCHPC, HPCF, PROPER, CCPI, VHPC, Ischia, Italy, August 31-September 3, 2010, Revised Selected Papers , 2011, Euro-Par Workshops.
[2] Luisa Carracciuolo,et al. Modelling the Behaviour of an Adaptive Scheduling Controller , 2012, 2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems.
[3] Domenico Cotroneo,et al. How Do Mobile Phones Fail? A Failure Data Analysis of Symbian OS Smart Phones , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[4] Anand Sivasubramaniam,et al. Filtering failure logs for a BlueGene/L prototype , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[5] Cheng-Zhong Xu,et al. Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[6] Jon Stearley,et al. Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[7] Franck Cappello,et al. Fault prediction under the microscope: A closer look into HPC systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Francesco Palmieri,et al. A Fault Avoidance Strategy Improving the Reliability of the EGI Production Grid Infrastructure , 2010, OPODIS.
[9] Valentina Casola,et al. Security and Performance Trade-off in PerfCloud , 2010, Euro-Par Workshops.
[10] Mark S. Squillante,et al. Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.
[11] Jon Stearley,et al. What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[12] Daniel P. Siewiorek,et al. Models for time coalescence in event logs , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[13] Gwan S. Choi,et al. Error and failure analysis of a UNIX server , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).
[14] Franck Cappello,et al. HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[15] Daniel P. Siewiorek,et al. A comparative analysis of event tupling schemes , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[16] Mohamed Kaâniche,et al. Event log based dependability analysis of Windows NT and 2K systems , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..
[17] Ravishankar K. Iyer,et al. Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.
[18] Domenico Cotroneo,et al. Assessing time coalescence techniques for the analysis of supercomputer logs , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[19] Anand Sivasubramaniam,et al. BlueGene/L Failure Analysis and Prediction Models , 2006, International Conference on Dependable Systems and Networks (DSN'06).
[20] Zhiling Lan,et al. System log pre-processing to improve failure prediction , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.
[21] Risto Vaarandi,et al. Mining event logs with SLCT and LogHound , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.
[22] Domenico Cotroneo,et al. Improving Log-based Field Failure Data Analysis of multi-node computing systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).
[23] Ravishankar K. Iyer,et al. Analyze-NOW-an environment for collection and analysis of failures in a network of workstations , 1996, IEEE Trans. Reliab..