Recoverable Distributed Shared Virtual Memory

The problem of rollback recovery in distributed shared virtual environments, in which the shared memory is implemented in software in a loosely coupled distributed multicomputer system, is examined. A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory. The checkpointing scheme can be integrated with the memory coherence protocol for managing the shared virtual memory. The twin-page disk design allows checkpointing to proceed in an incremental fashion without an explicit undo at the time of recovery. The recoverable distributed shared virtual memory allows the system to restart computation from a checkpoint without a global restart. >

[1]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[2]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[3]  Kang G. Shin,et al.  Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery Blocks , 1984, IEEE Transactions on Computers.

[4]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .

[5]  Albert Chang,et al.  801 storage: architecture and programming , 1988, TOCS.

[6]  K. H. Kim,et al.  Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation , 1988, IEEE Trans. Software Eng..

[7]  Hector Garcia-Molina,et al.  Optimizing Shadow Recovery Algorithms , 1988, IEEE Trans. Software Eng..

[8]  Raymond A. Lorie,et al.  Physical integrity in a large segmented database , 1977, TODS.

[9]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[10]  Andreas Nowatzyk,et al.  Coherent Shared Memory on a Distributed Memory Machine , 1989, International Conference on Parallel Processing.

[11]  Andreas Reuter A Fast Transaction-Oriented Logging Scheme for Undo Ro overy , 1980, IEEE Transactions on Software Engineering.

[12]  Daniel Gajski,et al.  CEDAR: a large scale multiprocessor , 1983, CARN.

[13]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.

[14]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, ASPLOS 1987.

[15]  Frederica Darema,et al.  Memory access patterns of parallel scientific programs , 1987, SIGMETRICS '87.

[16]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1981, TOCS.

[17]  Umakishore Ramachandran,et al.  Coherence of Distributed Shared Memory: Unifying Synchronization and Data Transfer , 1989, International Conference on Parallel Processing.

[18]  Satish M. Thatte,et al.  Persistent Memory: A Storage Architecture for Object-Oriented Database Systems , 1986, OODBS.