Recovery management in QuickSilver

This paper describes QuickSilver, developed at the IBM Almaden Research Center, which uses atomic transactions as a unified failure recovery mechanism for a client-server structured distributed system. Transactions allow failure atomicity for related activities at a single server or at a number of independent servers. Rather than bundling transaction management into a dedicated language or recoverable object manager, Quicksilver exposes the basic commit protocol and log recovery primitives, allowing clients and servers to tailor their recovery techniques to their specific needs. Servers can implement their own log recovery protocols rather than being required to use a system-defined protocol. These decisions allow servers to make their own choices to balance simplicity, efficiency, and recoverability.

[1]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[2]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[3]  Greg Thiel,et al.  LOCUS a network transparent, high reliability distributed system , 1981, SOSP.

[4]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[5]  Andrea J. Borr Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing , 1981, VLDB.

[6]  Michael Stonebraker,et al.  Operating system support for database management , 1981, CACM.

[7]  George G. Robertson,et al.  Accent: A communication oriented network operating system kernel , 1981, SOSP.

[8]  Butler W. Lampson,et al.  Distributed Systems - Architecture and Implementation, An Advanced Course , 1981, Advanced Course: Distributed Systems.

[9]  Joel F. Bartlett,et al.  A NonStop kernel , 1981, SOSP.

[10]  Martin Hopkins,et al.  An overview of the PL.8 compiler , 1982, SIGPLAN '82.

[11]  Ron Obermarck,et al.  Distributed deadlock detection algorithm , 1982, TODS.

[12]  James E. Allchin,et al.  Synchronization and recovery of actions , 1983, PODC '83.

[13]  Method for distributed transaction commit and recovery using Byzantine Agreement within clusters of processors , 1983, PODC '83.

[14]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[15]  Johanna D. Moore,et al.  A nested transaction mechanism for LOCUS , 1983, SOSP '83.

[16]  Laura M. Haas,et al.  Computation & communication in R: a distributed database manager , 1983, SOSP '83.

[17]  Barbara Liskov,et al.  Reliable object storage to support atomic actions , 1983, SOSP 1985.

[18]  Peter Martin Schwarz Transactions on typed objects , 1984 .

[19]  David R. Cheriton The V Kernel: A Software Base for Distributed Systems , 1984, IEEE Software.

[20]  Laura M. Haas,et al.  Computation and communication in R*: a distributed database manager , 1984, TOCS.

[21]  Eric C. Cooper Replicated distributed programs , 1985, SOSP 1985.

[22]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[23]  Willy Zwaenepoel,et al.  Distributed process groups in the V Kernel , 1985, TOCS.

[24]  James E. Allchin,et al.  Synchronization and recovery of actions , 1985, OPSR.

[25]  Alfred Z. Spector,et al.  Support for Distributed Transactions in the TABS Prototype , 1985, IEEE Transactions on Software Engineering.

[26]  Gerald J. Popek,et al.  Transactions and Synchronization in a Distributed Operating System , 1985, SOSP.

[27]  Barbara Liskov,et al.  Reliable object storage to support atomic actions , 1983, SOSP '85.

[28]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[29]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP '85.

[30]  C. Mohan,et al.  Method for distributed transaction commit and recovery using Byzantine Agreement within clusters of processors , 1983, PODC '83.

[31]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP 1985.

[32]  Eric C. Cooper Replicated distributed programs , 1985, SOSP '85.

[33]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[34]  Calton Pu,et al.  Regeneration of replicated objects: A technique and its Eden implementation , 1986, 1986 IEEE Second International Conference on Data Engineering.

[35]  Maurice Herlihy,et al.  Avalon : language support for reliable distributed systems , 1986 .

[36]  Alfred Z. Spector Camelot : a distributed transaction facility for mach and the internet - an interim report , 1987 .

[37]  L. F. Cabrera,et al.  QuickSilver distributed file services: an architecture for horizontal growth , 1988, [1988] Proceedings. 2nd IEEE Conference on Computer Workstations.

[38]  Albert Chang,et al.  801 storage: architecture and programming , 1988, TOCS.