Time-Based Coordinated Checkpointing
暂无分享,去创建一个
[1] Frank B. Schmuck,et al. Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems , 1995 .
[2] Nuno Neves,et al. Coordinated checkpointing without direct coordination , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).
[3] David B. Johnson,et al. Sender-Based Message Logging , 1987 .
[4] Luís Moura Silva,et al. Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[5] Taesoon Park,et al. Checkpointing and rollback-recovery in distributed systems , 1989 .
[6] Flaviu Cristian,et al. Agreeing on who is present and who is absent in a synchronous distributed system , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[7] W. Kent Fuchs,et al. Progressive retry for software error recovery in distributed systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[8] Sudhakar M. Reddy,et al. A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair , 1984, IEEE Transactions on Computers.
[9] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.
[10] Takashi Nanya,et al. Hierarchical adaptive distributed system-level diagnosis applied for SNMP-based network fault management , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.
[11] Nitin H. VaidyaDepartment,et al. Another Two-Level Failure Recovery Scheme : Performance Impact of Checkpoint Placement andCheckpoint Latency , 1994 .
[12] B. R. Badrinath,et al. Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.
[13] Peter Steenkiste,et al. Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery , 1993 .
[14] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[15] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.
[16] Augusto Ciuffoletti,et al. A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.
[17] Sampath Rangarajan,et al. A Distributed System-Level Diagnosis Algorithm for Arbitrary Network Topologies , 1995, IEEE Trans. Computers.
[18] Junguk L. Kim,et al. An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..
[19] Charles E. Perkins,et al. IP Mobility Support , 1996, RFC.
[20] Henrique Madeira,et al. Experimental assessment of parallel systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[21] Martin A. W. Nemzow. Implementing Wireless Networks , 1995 .
[22] Mukesh Singhal,et al. Using logging and asynchronous checkpointing to implement recoverable distributed shared memory , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[23] Yair Amir,et al. Transis: A Communication Sub-system for High Availability , 1992 .
[24] Nitin H. Vaidya,et al. A case for two-level distributed recovery schemes , 1995, SIGMETRICS '95/PERFORMANCE '95.
[25] Patrick H. Worley,et al. Parallel community climate model: Description and user`s guide , 1996 .
[26] Nuno Neves,et al. A study of a non-linear optimization problem using a distributed genetic algorithm , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[27] Wei-Tek Tsai,et al. A low overhead checkpointing and rollback recovery scheme for distributed systems , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.
[28] W. Kent Fuchs,et al. Scheduling message processing for reducing rollback propagation , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[29] W. Kent Fuchs,et al. Fault detection using hints from the socket layer , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.
[30] Sean W. Smith,et al. Completely asynchronous optimistic recovery with minimal rollbacks , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[31] Ronald P. Bianchini,et al. The Adapt2 on-line diagnosis algorithm for general topology networks , 1992, [Conference Record] GLOBECOM '92 - Communications for Global Users: IEEE.
[32] Vaduvur Bharghavan,et al. Challenges and Solutions to Adaptive Computing and Seamless Mobility over Heterogeneous Wireless Networks , 1997, Wirel. Pers. Commun..
[33] Willy Zwaenepoel,et al. On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[34] Peter A. Barrett,et al. Using passive replicates in Delta-4 to provide dependable distributed computing , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[35] John Zahorjan,et al. The challenges of mobile computing , 1994, Computer.
[36] Kenneth P. Birman,et al. Consistent Failure Reporting in Reliable Communication Systems , 1993 .
[37] Luke Lin,et al. Checkpointing and rollback-recovery in distributed object based systems , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.
[38] A. Fleischmann. Distributed Systems , 1994, Springer Berlin Heidelberg.
[39] Jian Xu,et al. Adaptive message logging for incremental program replay , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.
[40] José Rufino,et al. A low-level processor group membership protocol for LANs , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.
[41] Daniel S. Nydick,et al. Practical application and implementation of distributed system-level diagnosis theory , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.
[42] D. Manivannan,et al. A low-overhead recovery technique using quasi-synchronous checkpointing , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.
[43] Roberto Baldoni,et al. An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems , 1999, IEEE Trans. Parallel Distributed Syst..
[44] Wolfgang Graetsch,et al. Fault tolerance under UNIX , 1989, TOCS.
[45] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[46] Flaviu Cristian,et al. Probabilistic internal clock synchronization , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.
[47] Ten-Hwang Lai,et al. On Distributed Snapshots , 1987, Inf. Process. Lett..
[48] Yuval Tamir,et al. ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .
[49] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[50] Matti A. Hiltunen. Membership and system diagnosis , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.
[51] Nuno Neves,et al. RENEW: a tool for fast and efficient implementation of checkpoint protocols , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[52] Hon Fung Li,et al. Optimal Checkpointing and Local Recording for Domino-Free Rollback Recovery , 1987, Inf. Process. Lett..
[53] James S. Plank,et al. Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.
[54] Shivakant Mishra,et al. Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..
[55] Kun-Lung Wu,et al. Recoverable Distributed Shared Virtual Memory , 1990, IEEE Trans. Computers.
[56] Nuno Neves,et al. Using time to improve the performance of coordinated checkpointing , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.
[57] Bharat K. Bhargava,et al. A model for concurrent checkpointing and recovery using transactions , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.
[58] Flaviu Cristian,et al. A timestamp-based checkpointing protocol for long-lived distributed computations , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.
[59] W. Kent Fuchs,et al. Optimistic message logging for independent checkpointing in message-passing systems , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[60] GERNOT METZE,et al. On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..
[61] James S. Plank. Efficient checkpointing on MIMD architectures , 1993 .
[62] Hermann Kopetz,et al. Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.
[63] Nuno Neves,et al. Adaptive recovery for mobile environments , 1997, CACM.
[64] Arthur P. Goldberg. Transparent Recovery of Mach Applications , 1990, USENIX MACH Symposium.
[65] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.
[66] W. Kent Fuchs,et al. Reduced overhead logging for rollback recovery in distributed shared memory , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[67] A. Prasad Sistla,et al. Efficient distributed recovery using message logging , 1989, PODC '89.
[68] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[69] C. R. Kime,et al. System diagnosis , 1986 .
[70] Anne-Marie Kermarrec,et al. A recoverable distributed shared memory integrating coherence and recoverability , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[71] Parameswaran Ramanathan,et al. Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System , 1993, IEEE Trans. Software Eng..
[72] Bharat K. Bhargava,et al. Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.
[73] P. Reynier,et al. Active replication in Delta-4 , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[74] Zbigniew M. Wójcik,et al. Fault tolerant distributed computing using atomic send-receive checkpoints , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.
[75] J. R. Kenevan,et al. A non-FIFO checkpointing protocol for distributed systems , 1991, [Proceedings] 1991 Symposium on Applied Computing.
[76] Miguel Castro,et al. Lightweight logging for lazy release consistent distributed shared memory , 1996, OSDI '96.
[77] Janak H. Patel,et al. Error Recovery in Shared Memory Multiprocessors Using Private Caches , 1990, IEEE Trans. Parallel Distributed Syst..
[78] Ragunathan Rajkumar,et al. Processor group membership protocols: specification, design and implementation , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[79] Ravishankar K. Iyer,et al. An object-oriented testbed for the evaluation of checkpointing and recovery systems , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[80] Phil Kearns,et al. Rollback based on vector time , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[81] David L. Russell,et al. State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.
[82] Miguel Castro,et al. A checkpoint protocol for an entry consistent shared memory system , 1994, PODC '94.
[83] David L. Presotto,et al. Publishing: a reliable broadcast communication mechanism , 1983, SOSP '83.
[84] W. Kent Fuchs,et al. Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[85] K. H. Kim,et al. An efficient decentralized approach to processor-group membership maintenance in real-time LAN systems: the PRHB/ED scheme , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[86] Ravishankar K. Iyer,et al. DEPEND: A Simulation-Based Environment for System Level Dependability Analysis , 1997, IEEE Trans. Computers.
[87] Jonathan Walpole,et al. MIST: PVM with Transparent Migration and Checkpointing , 1995 .
[88] Roberto Baldoni,et al. An index-based checkpointing algorithm for autonomous distributed systems , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.
[89] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[90] Anita Borg,et al. A message system supporting fault tolerance , 1983, SOSP '83.
[91] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.
[92] Nitin H. Vaidya,et al. On Checkpoint Latency , 1995 .
[93] Ronald P. Bianchini,et al. An Adaptive Distributed System-Level Diagnosis Algorithm and Its Implementation , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..
[94] W. Richard Stevens,et al. Unix network programming , 1990, CCRV.
[95] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[96] Ray Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[97] Bharat K. Bhargava,et al. Concurrent robust checkpointing and recovery in distributed systems , 1988, Proceedings. Fourth International Conference on Data Engineering.
[98] Sang Hyuk Son,et al. Distributed Checkpointing for Globally Consistent States of Databases , 1989, IEEE Transactions on Software Engineering.
[99] W. Kent Fuchs,et al. Relaxing consistency in recoverable distributed shared memory , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[100] Yuval Tamir,et al. Application-transparent process-level error recovery for multicomputers , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.
[101] Lorenzo Alvisi,et al. Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[102] Randy H. Katz,et al. The Bay Area Research Wireless Access Network (BARWAN) , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.