Demonic memory for process histories

Demonic memory is a form of reconstructive memory for process histories. As a process executes, its states are regularly checkpointed, generating a history of the process at low time resolution. Following the initial generation, any prior state of the process can be reconstructed by starting from a checkpointed state and re-executing the process up through the desired state, thereby exploiting the redundancy between the states of a process and the description of that process (i.e., a computer program). The reconstruction of states is automatic and transparent. The history of a process may be examined as though it were a large two-dimensional array, or address space-time, with a normal address space as one axis and steps of process time as the other. An attempt to examine a state that is not physically stored triggers a “demon” which reconstructs that memory state before access is allowed. Regeneration requires an exact description of the original execution of the process. If the original process execution depends on non-deterministic events (e.g., user input), these events are recorded in an exception list, and are replayed at the proper points during re-execution. While more efficient than explicitly storing all state changes, such a checkpointing system is still prohibitively expensive for many applications; each copy (or snapshot) of the system's state may be very large, and many snapshots may be required. Demonic memory saves both space and time by using a virtual copy mechanism. (Virtual copies share unchanging data with the objects that they are copies of, only storing differences from a prototype or original [MiBK86].) In demonic memory, the snapshot at each checkpoint is a virtual copy of the preceding checkpoint's snapshot. Hence it is called a virtual snapshot. In order to make the virtual snapshot mechanism efficient, state information is initially saved in relatively large units of space and time, on the order of pages and seconds, with single-word/single-step regeneration undertaken only as needed. This permits the costs of indexing and lookup operations to be amortized over many locations.

[1]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[2]  Jeffrey Scott Vitter,et al.  US&R: A new framework for redoing (Extended Abstract) , 1984, SDE 1.

[3]  William D. Clinger,et al.  Revised3 report on the algorithmic language scheme , 1986, SIGP.

[4]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[5]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[6]  Gordon V. Cormack,et al.  Structured Program Lookahead , 1987, Comput. Lang..

[7]  R.E. Strom,et al.  A recoverable object store , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[8]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[9]  Paul Klint,et al.  Towards monolingual programming environments , 1985, TOPL.

[10]  Jong-Deok Choi,et al.  A mechanism for efficient debugging of parallel programs , 1988, PADD '88.

[11]  Douglas W. Clark,et al.  An empirical study of list structure in Lisp , 1977, CACM.

[12]  Thomas G. Moher,et al.  PROVIDE: A Process Visualization and Debugging Environment , 1988, IEEE Trans. Software Eng..

[13]  Ralph E. Griswold,et al.  The Icon programming language , 1983 .

[14]  Robert H. Halstead,et al.  Parallel Symbolic Computing , 1986, Computer.

[15]  Gerald Jay Sussman,et al.  An Interpreter for Extended Lambda Calculus , 1975 .

[16]  Robert Allen Shaw,et al.  Empirical analysis of a LISP system , 1988 .

[17]  M. Katz PARATRAN: A TRANSPARENT, TRANSACTION BASED RUNTIME MECHANISM FOR PARALLEL EXECUTION OF SCHEME , 1989 .

[18]  Richard P. Gabriel,et al.  Performance and evaluation of Lisp systems , 1985 .

[19]  Anita Borg,et al.  A message system supporting fault tolerance , 1983, SOSP '83.

[20]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[21]  Warren Teitelman Automated programmering: the programmer's assistant , 1972, AFIPS '72 (Fall, part II).

[22]  David A. Moon,et al.  Garbage collection in a large LISP system , 1984, LFP '84.

[23]  Daniel P. Friedman,et al.  Programming with Continuations , 1984 .

[24]  Henry Lieberman,et al.  A real-time garbage collector based on the lifetimes of objects , 1983, CACM.

[25]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[26]  George B. Leeman A formal approach to undo operations in programming languages , 1986, TOPL.

[27]  Daniel G. Bobrow,et al.  Virtual Copies - At the Boundary Between Classes and Instances , 1986, OOPSLA.

[28]  Brian Randell System structure for software fault tolerance , 1975 .

[29]  Paul R. Wilson,et al.  A “card-marking” scheme for controlling intergenerational references in generation-based garbage collection on stock hardware , 1989, SIGP.

[30]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[31]  David Ungar,et al.  The design and evaluation of a high performance Smalltalk system , 1987 .

[32]  David M. Ungar,et al.  Generation Scavenging: A non-disruptive high performance storage reclamation algorithm , 1984, SDE 1.

[33]  Mitchell Wand,et al.  Continuations and coroutines , 1984, LFP '84.

[34]  Paul R. Wilson Opportunistic garbage collection , 1988, SIGP.

[35]  Mark A. Linton,et al.  Supporting reverse execution for parallel programs , 1988, PADD '88.

[36]  Paul R. Wilson A simple bucket-brigade advancement mechanism for generation-bases garbage collection , 1989, SIGP.

[37]  David L. Presotto,et al.  Publishing: a reliable broadcast communication mechanism , 1983, SOSP '83.

[38]  Jeffrey Scott Vitter US&R: A new framework for redoing (Extended Abstract) , 1984 .

[39]  David Ungar Generation scavenging: a nondisruptive high performance storage reclamation algorithm , 1984 .

[40]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[41]  Robert Balzer,et al.  EXDAMS: extendable debugging and monitoring system , 1969, AFIPS '69 (Spring).

[42]  Thomas Reps,et al.  Programming Techniques and Data Structures , 1981 .

[43]  Fred B. Schneider,et al.  User Recovery and Reversal in Interactive Systems , 1984, TOPL.

[44]  Phil Hontalas,et al.  Distributed Simulation and the Time Wrap Operating System. , 1987, SOSP 1987.

[45]  Jonathan M. Smith,et al.  Transparent concurrent execution of mutually exclusive alternatives , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.