Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems
暂无分享,去创建一个
[1] Erol Gelenbe,et al. On the Optimum Checkpoint Interval , 1979, JACM.
[2] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[3] S. Yajnik,et al. Checkpointing in CosMiC: a user-level process migration environment , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.
[4] John G. Kemeny,et al. Finite Markov Chains. , 1960 .
[5] Nozer D. Singpurwalla,et al. An Empirically Developed Fourier Series Model for Describing Software Failures , 1984, IEEE Transactions on Reliability.
[6] J. Griffiths. The Theory of Stochastic Processes , 1967 .
[7] Willy Zwaenepoel,et al. On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[8] Nitin H. Vaidya,et al. A case for two-level distributed recovery schemes , 1995, SIGMETRICS '95/PERFORMANCE '95.
[9] N. U. Prabhu. Review: D. R. Cox, H. D. Miller, The Theory of Stochastic Processes , 1966 .
[10] James S. Plank,et al. Experimental assessment of workstation failures and their impact on checkpointing systems , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[11] Kai Li,et al. CLIP: A Checkpointing Tool for Message Passing Parallel Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[12] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[13] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.
[14] Jonathan Walpole,et al. MIST: PVM with Transparent Migration and Checkpointing , 1995 .
[15] Nitin H. Vaidya,et al. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme , 1997, IEEE Trans. Computers.
[16] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[17] James S. Plank,et al. Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.
[18] Miron Livny,et al. Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.
[19] John G. Kemeny,et al. Finite Markov chains , 1960 .
[20] Emanuel Parzen,et al. Stochastic Processes , 1962 .
[21] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[22] Miron Livny,et al. Managing Checkpoints for Parallel Programs , 1996, JSSPP.
[23] Darrell D. E. Long,et al. A longitudinal survey of Internet host reliability , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.
[24] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[25] William H. Sanders,et al. Performance analysis of two time-based coordinated checkpointing protocols , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.
[26] Mark A. Franklin,et al. Checkpointing in Distributed Computing Systems , 1996, J. Parallel Distributed Comput..
[27] Georg Stellner. Consistent Checkpoints of PVM Applications , 1994 .
[28] A. Barbour,et al. Poisson Approximation , 1992 .
[29] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[30] Kai Li,et al. ickp: a consistent checkpointer for multicomputers , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.
[31] Peter Steenkiste,et al. Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery , 1993 .
[32] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1951 .
[33] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[34] Kim Buckner. Timings and memory usage for the NAS Parallel Benchmarks on anetwork of Sun Ultra Workstations , 1998 .
[35] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[36] James S. Plank,et al. The average availability of parallel checkpointing systems and its importance in selecting runtime parameters , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[37] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1967 .
[38] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[39] W YoungJohn. A first order approximation to the optimum checkpoint interval , 1974 .
[40] Yi-Min Wang,et al. Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.