Stable object storage for multiprocessors with distributed shared memory

We present the design of a stable object storage for a parallel computer with distributed shared memory. A prototype has been built for our experimental multiprocessor MEMSY. The stable storage has been realized with RAM memory in order to obtain short access times. The objects stored are primarily checkpoints for roll back recovery.

[1]  Mario Dal Cin,et al.  MEMSY - A Modular Expandable Multiprocessor System , 1993, Parallel Computer Architectures.

[2]  Butler W. Lampson,et al.  Distributed Systems — Architecture and Implementation , 1982, Lecture Notes in Computer Science.

[3]  Michel Banâtre,et al.  Design decisions for the FTM: a general purpose fault tolerant machine , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[4]  Wolfgang Hohl,et al.  Fault Tolerance in Distributed Shared Memory Multiprocessors , 1993, Parallel Computer Architectures.

[5]  U. Hildebrand A Fault Tolerant Interconnection Network for Memory-Coupled Multiprocessor Systems , 1991, Fault-Tolerant Computing Systems.

[6]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.