An efficient logging scheme for recoverable distributed shared memory systems

The paper presents a new logging scheme for recoverable distributed shared memory systems. In previous schemes, the logging is performed whenever a new data item is accessed or written by a process. However, in the proposed scheme, only the data item accessed by multiple processes is logged when it is invalidated by the overwritten. Moreover, the logging is performed at one process responsible for that data item, unlike the other schemes in which every process accessing the data item performs the logging. As a result, the amount and the frequency of logging can be significantly reduced. The performance of the proposed scheme is analyzed using extensive simulation study and our new logging scheme shows superior performance in various system environments.

[1]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[2]  Richard D. Schlichting,et al.  Fail-Stop Processors: An Approach to Designing Computing Systems , 1983 .

[3]  W. Kent Fuchs,et al.  Reduced overhead logging for rollback recovery in distributed shared memory , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Miguel Castro,et al.  Lightweight logging for lazy release consistent distributed shared memory , 1996, OSDI '96.

[5]  André Schiper,et al.  The Causal Ordering Abstraction and a Simple Way to Implement it , 1991, Inf. Process. Lett..

[6]  Meichun Hsu,et al.  Fast recovery in distributed shared virtual memory systems , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[7]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.

[8]  Jennifer L. Welch,et al.  Implementation of recoverable distributed shared memory by logging writes , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[9]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[10]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[11]  Sundarrajan S Kanthadai Recoverable distributed shared memory , 1996 .

[12]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[13]  Anne-Marie Kermarrec,et al.  A recoverable distributed shared memory integrating coherence and recoverability , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[14]  Gilbert Cabillic,et al.  The performance of consistent checkpointing in distributed shared memory systems , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[15]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[16]  Alan L. Cox,et al.  Message passing versus distributed shared memory on networks of workstations , 1995 .

[17]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[18]  Kun-Lung Wu,et al.  Recoverable Distributed Shared Virtual Memory , 1990, IEEE Trans. Computers.

[19]  Mukesh Singhal,et al.  Using logging and asynchronous checkpointing to implement recoverable distributed shared memory , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[20]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[21]  Yuval Tamir,et al.  Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[22]  Miguel Castro,et al.  A checkpoint protocol for an entry consistent shared memory system , 1994, PODC '94.

[23]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[24]  Michael Stumm,et al.  Fault tolerant distributed shared memory algorithms , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[25]  Lorenzo Alvisi,et al.  Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[26]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .