Failure analysis of distributed scientific workflows executing in the cloud
暂无分享,去创建一个
Ewa Deelman | Gideon Juve | Dan Gunter | Karan Vahi | Fabio Silva | Taghrid Samak | Monte Goode | D. Gunter | E. Deelman | K. Vahi | G. Juve | T. Samak | M. Goode | Fabio Silva
[1] David A. Cieslak,et al. Troubleshooting thousands of jobs on production grids using data mining techniques , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.
[2] Ewa Deelman,et al. Scientific Workflows in the Cloud , 2011 .
[3] Brian Tierney,et al. Log summarization and anomaly detection for troubleshooting distributed systems , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.
[4] Ewa Deelman,et al. Online Fault and Anomaly Detection for Large-Scale Scientific Workflows , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.
[5] Anand Sivasubramaniam,et al. Failure Prediction in IBM BlueGene/L Event Logs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[6] Ewa Deelman,et al. Failure prediction and localization in large scientific workflows , 2011, WORKS '11.
[7] Alexandru Iosup,et al. Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.
[8] Michael Wilde,et al. Kickstarting remote applications , 2006 .
[9] Daniel S. Katz,et al. Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand , 2004, SPIE Astronomical Telescopes + Instrumentation.
[10] Ran Wolff,et al. Mining for misconfigured machines in grid systems , 2006, KDD '06.
[11] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.
[12] G. Bruce Berriman,et al. Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Lavanya Ramakrishnan,et al. Magellan: experiences from a science cloud , 2011, ScienceCloud '11.
[14] Zhiling Lan,et al. Toward Automated Anomaly Identification in Large-Scale Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.
[15] Daniel A. Reed,et al. Analysis of application heartbeats: Learning structural and temporal features in time series data for identification of performance problems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Chuang Liu,et al. Anomaly detection and diagnosis in grid environments , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[17] D. Martin Swany,et al. Online workflow management and performance analysis with Stampede , 2011, 2011 7th International Conference on Network and Service Management.
[18] Wenguang Chen,et al. Cloud versus in-house cluster: Evaluating Amazon cluster compute instances for running MPI applications , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Narayan Desai,et al. Co-analysis of RAS Log and Job Log on Blue Gene/P , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[20] Ewa Deelman,et al. Experiences using cloud computing for a scientific workflow application , 2011, ScienceCloud '11.
[21] Brian Tierney,et al. NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging , 2003, Integrated Network Management.
[22] Cheng-Zhong Xu,et al. Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[23] Kashi Venkatesh Vishwanath,et al. Characterizing cloud computing hardware reliability , 2010, SoCC '10.
[24] Daniel S. Katz,et al. Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..