论文信息 - Message fragment based causal message logging

Message fragment based causal message logging

In a distributed computing system, message logging is widely used for providing nodes with recoverability. To reduce the piggyback overhead of traditional causal message logging, we present a zoning causal message logging approach in this paper. The crux of the approach is to control the propagation of dependency information: the nodes in the system are divided into zones, and by a message fragment mechanism, the dependency information of a node is only visible in the zone scope. Simulation results show that the piggyback overhead of the proposed approach is lower than that of traditional causal message logging.

[1] David L. Presotto,et al. Publishing: a reliable broadcast communication mechanism , 1983, SOSP '83.

[2] Lorenzo Alvisi,et al. Causality tracking in causal message-logging protocols , 2002, Distributed Computing.

[3] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4] David B. Johnson,et al. Sender-Based Message Logging , 1987 .

[5] Vijay K. Garg,et al. Distributed recovery with K-optimistic logging , 2003, J. Parallel Distributed Comput..

[6] Chita R. Das,et al. Towards a communication characterization methodology for parallel applications , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[7] Richard D. Schlichting,et al. Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[8] Heon Young Yeom,et al. An efficient recovery scheme for fault-tolerant mobile computing systems , 2003, Future Gener. Comput. Syst..

[9] Lorenzo Alvisi,et al. Scalable causal message logging for wide‐area environments , 2003, Concurr. Comput. Pract. Exp..

[10] Heon Young Yeom,et al. An asynchronous recovery scheme based on optimistic message logging for mobile computing systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[11] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[12] Sean W. Smith,et al. Completely asynchronous optimistic recovery with minimal rollbacks , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[13] Tong-Ying Tony Juang,et al. Optimistic Crash Recovery without Changing Application Messages , 1997, IEEE Trans. Parallel Distributed Syst..

[14] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[15] David J. Lilja,et al. Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs , 1998, CANPC.

[16] Jeffrey S. Vetter,et al. Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[17] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .

[18] Lorenzo Alvisi,et al. Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[19] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[20] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.