Rewind, repair, replay: three R's to dependability

Motivated by the growth of web and infrastructure services and their susceptibility to human operator-related failures, we introduce system-level undo as a recovery mechanism designed to improve service dependability. Undo enables system operators to recover from their inevitable mistakes and furthermore enables retroactive repair of problems that were not fixed quickly enough to prevent detrimental effects. We present the "three R's", a model of undo that matches the needs of human error recovery and retroactive repair; discuss several of the issues raised by this undo model; and introduce an initial architectural framework for undoable systems using the example of an undoable e-mail service system.

[1]  Takeo Igarashi,et al.  A temporal model for multi-level undo and redo , 2000, UIST '00.

[2]  James Reason,et al.  Human Error , 1990 .

[3]  Abraham Silberschatz,et al.  A Formal Approach to Recovery by Compensating Transactions , 1990, VLDB.

[4]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[5]  Peter M. Chen,et al.  Exploring failure transparency and the limits of generic recovery , 2000, OSDI.

[6]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[7]  Elizabeth D. Mynatt,et al.  Timewarp: techniques for autonomous collaboration , 1997, CHI.

[8]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[9]  W. Keith Edwards,et al.  Flexible conflict detection and management in collaborative applications , 1997, UIST '97.

[10]  Wolfgang Graetsch,et al.  Fault tolerance under UNIX , 1989, TOCS.

[11]  Steven K. Feiner,et al.  Editable graphical histories , 1988, [Proceedings] 1988 IEEE Workshop on Visual Languages.

[12]  David A. Patterson,et al.  Lessons from the PSTN for Dependable Computing , 2002 .

[13]  Jun Rekimoto,et al.  Time-machine computing: a time-centric approach for the information environment , 1999, UIST '99.

[14]  David Gelernter,et al.  Lifestreams: a storage model for personal data , 1996, SGMD.