Complexity Analysis of Checkpoint Scheduling with Variable Costs
暂无分享,去创建一个
Denis Trystram | Frédéric Wagner | Mohamed-Slim Bouguerra | D. Trystram | M. Bouguerra | Frédéric Wagner
[1] Denis Trystram,et al. Analyzing scheduling with transient failures , 2009, Inf. Process. Lett..
[2] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[3] Rudolf Eigenmann,et al. Failure-aware checkpointing in fine-grained cycle sharing systems , 2007, HPDC '07.
[4] Kishor S. Trivedi,et al. Minimizing completion time of a program by checkpointing and rejuvenation , 1996, SIGMETRICS '96.
[5] Thierry Gautier,et al. Optimised Recovery with a Coordinated Checkpoint/Rollback Protocol for Domain Decomposition Applications , 2008, MCO.
[6] Tadashi Dohi,et al. Distribution-free checkpoint placement algorithms based on min-max principle , 2006, IEEE Transactions on Dependable and Secure Computing.
[7] Thomas Hérault,et al. MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI , 2006, Int. J. High Perform. Comput. Appl..
[8] Tadashi Dohi,et al. Numerical computation algorithms for sequential checkpoint placement , 2009, Perform. Evaluation.
[9] Peter H. Beckman,et al. Understanding Checkpointing Overheads on Massive-Scale Systems: Analysis of the IBM Blue Gene/P System , 2010, Int. J. High Perform. Comput. Appl..
[10] Zizhong Chen,et al. Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing , 2009, IEEE Transactions on Computers.
[11] Tadashi Dohi,et al. Optimal Checkpoint Placement with Equality Constraints , 2006, 2006 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing.
[12] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[13] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[14] Franck Cappello,et al. Distributed Diskless Checkpoint for Large Scale Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[15] Jean-Marc Vincent,et al. A Flexible Checkpoint/Restart Model in Distributed Systems , 2009, PPAM.
[16] Nitin H. Vaidya,et al. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme , 1997, IEEE Trans. Computers.
[17] Henri Casanova,et al. Checkpointing strategies for parallel jobs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[18] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[19] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.
[20] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..
[21] Inna K. Shingareva,et al. Numerical Analysis and Scientific Computing , 2006 .
[22] Franck Cappello,et al. Modeling and tolerating heterogeneous failures in large parallel systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[23] D. Nurmi. Model-Based Checkpoint Scheduling for Volatile Resource Environments , 2004 .
[24] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[25] Xiaola Lin,et al. A Variational Calculus Approach to Optimal Checkpoint Placement , 2001, IEEE Trans. Computers.
[26] Andrew Lumsdaine,et al. The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[27] James S. Plank,et al. Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems , 2001, J. Parallel Distributed Comput..
[28] Xian-He Sun,et al. Optimizing HPC Fault-Tolerant Environment: An Analytical Approach , 2010, 2010 39th International Conference on Parallel Processing.
[29] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[30] Özalp Babaoglu,et al. On the Optimum Checkpoint Selection Problem , 1984, SIAM J. Comput..
[31] Andrzej Duda,et al. The Effects of Checkpointing on Program Execution Time , 1983, Inf. Process. Lett..
[32] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..