Recovery management in QuickSilver

This paper describes QuickSilver, developed at the IBM Almaden Research Center, which uses atomic transactions as a unified failure recovery mechanism for a client-server structured distributed system. Transactions allow failure atomicity for related activities at a single server or at a number of independent servers. Rather than bundling transaction management into a dedicated language or recoverable object manager, Quicksilver exposes the basic commit protocol and log recovery primitives, allowing clients and servers to tailor their recovery techniques to their specific needs. Servers can implement their own log recovery protocols rather than being required to use a system-defined protocol. These decisions allow servers to make their own choices to balance simplicity, efficiency, and recoverability.

[1]  James E. Allchin,et al.  Synchronization and recovery of actions , 1983, PODC '83.

[2]  Willy Zwaenepoel,et al.  Distributed process groups in the V Kernel , 1985, TOCS.

[3]  Martin Hopkins,et al.  An overview of the PL.8 compiler , 1982, SIGP.

[4]  Eric C. Cooper Replicated distributed programs , 1985, SOSP 1985.

[5]  Andrea J. Borr Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing , 1981, VLDB.

[6]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP '85.

[7]  Calton Pu,et al.  Regeneration of replicated objects: A technique and its Eden implementation , 1986, 1986 IEEE Second International Conference on Data Engineering.

[8]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[9]  Greg Thiel,et al.  LOCUS a network transparent, high reliability distributed system , 1981, SOSP.

[10]  Albert Chang,et al.  801 storage: architecture and programming , 1988, TOCS.

[11]  Ron Obermarck,et al.  Distributed deadlock detection algorithm , 1982, TODS.

[12]  Johanna D. Moore,et al.  A nested transaction mechanism for LOCUS , 1983, SOSP '83.

[13]  Butler W. Lampson,et al.  Distributed Systems - Architecture and Implementation, An Advanced Course , 1981, Advanced Course: Distributed Systems.

[14]  Gerald J. Popek,et al.  Transactions and Synchronization in a Distributed Operating System , 1985, SOSP.

[15]  Laura M. Haas,et al.  Computation & communication in R: a distributed database manager , 1983, SOSP '83.

[16]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[17]  Maurice Herlihy,et al.  Avalon : language support for reliable distributed systems , 1986 .

[18]  Alfred Z. Spector,et al.  Support for Distributed Transactions in the TABS Prototype , 1985, IEEE Transactions on Software Engineering.

[19]  David R. Cheriton The V Kernel: A Software Base for Distributed Systems , 1984, IEEE Software.

[20]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[21]  Alfred Z. Spector Camelot : a distributed transaction facility for mach and the internet - an interim report , 1987 .

[22]  Michael Stonebraker,et al.  Operating system support for database management , 1981, CACM.

[23]  Eric C. Cooper Replicated distributed programs , 1985, SOSP '85.

[24]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[25]  James E. Allchin,et al.  Synchronization and recovery of actions , 1985, OPSR.

[26]  Barbara Liskov,et al.  Reliable object storage to support atomic actions , 1983, SOSP '85.

[27]  Liba Svobodova,et al.  A distributed data storage system for a local network , 1980 .

[28]  Peter Martin Schwarz Transactions on typed objects , 1984 .

[29]  L. F. Cabrera,et al.  QuickSilver distributed file services: an architecture for horizontal growth , 1988, [1988] Proceedings. 2nd IEEE Conference on Computer Workstations.

[30]  Martin Hopkins,et al.  An overview of the PL.8 compiler , 1982, SIGPLAN '82.

[31]  C. Mohan,et al.  Method for distributed transaction commit and recovery using Byzantine Agreement within clusters of processors , 1983, PODC '83.

[32]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[33]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP 1985.

[34]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[35]  George G. Robertson,et al.  Accent: A communication oriented network operating system kernel , 1981, SOSP.

[36]  Joel F. Bartlett,et al.  A NonStop kernel , 1981, SOSP.

[37]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[38]  Laura M. Haas,et al.  Computation and communication in R*: a distributed database manager , 1984, TOCS.

[39]  Barbara Liskov,et al.  Reliable object storage to support atomic actions , 1983, SOSP 1985.