Trua: Efficient Task Replication for Flexible User-defined Availability in Scientific Grids
暂无分享,去创建一个
[1] Rajkumar Buyya,et al. Failure-aware resource provisioning for hybrid Cloud infrastructure , 2012, J. Parallel Distributed Comput..
[2] Richard Wolski,et al. Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.
[3] Zhe Zhang,et al. Discovering Job Preemptions in the Open Science Grid , 2018, PEARC.
[4] Shantenu Jha,et al. A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..
[5] Jemal H. Abawajy,et al. Fault-tolerant scheduling policy for grid computing systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[6] M. Amoon. Design of a Fault-Tolerant Scheduling System for Grid Computing , 2011, 2011 Second International Conference on Networking and Distributed Computing.
[7] Alexandru Iosup,et al. Analysis and modeling of time-correlated failures in large-scale distributed systems , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.
[8] Sudipto Guha,et al. Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.
[9] Indranil Gupta,et al. On Availability of Intermediate Data in Cloud Computations , 2009, HotOS.
[10] Haryadi S. Gunawi,et al. Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages , 2016, SoCC.
[11] K. G. Srinivasa,et al. Fault-Tolerant Middleware for Grid Computing , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).
[12] Zuoning Chen,et al. A Large-Scale Study of Failures on Petascale Supercomputers , 2018, Journal of Computer Science and Technology.
[13] Jie Xu,et al. An Analysis of Failure-Related Energy Waste in a Large-Scale Cloud Environment , 2014, IEEE Transactions on Emerging Topics in Computing.
[14] Alexandru Iosup,et al. The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[15] Kenli Li,et al. An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems , 2014, Cluster Computing.
[16] Dimitrios Skoutas,et al. Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments , 2007, Future Gener. Comput. Syst..
[17] Alessandro Cilardo,et al. Enabling HPC for QoS-sensitive applications: The MANGO approach , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[18] Alexandru Iosup,et al. On the dynamic resource availability in grids , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.
[19] Song Fu,et al. Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.
[20] Franck Cappello,et al. Exploring Properties and Correlations of Fatal Events in a Large-Scale HPC System , 2019, IEEE Transactions on Parallel and Distributed Systems.
[21] Igor Sfiligoi,et al. glideinWMS - A generic pilot-based Workload Management System , 2008 .