Some ideas on support for fault tolerance in COMANDOS, an object oriented distributed system

The distributed systems group in Trinity has been concerned with fault tolerance for a number of years and are now turning our attention to the topic with renewed interest (and urgency). Specifically we are concerned to provide mechanisims for fault tolerance in the Oisin kernel - the local implementation of the COMANDOS object oriented distributed system. This short position paper outlines our expertise gained to date, some of the lessons we have learned and the avenues which we are currently investigating in order to make Oisin reliable.

[1]  David B. Johnson,et al.  Sender-Based Message Logging , 1987 .

[2]  Roland Balter,et al.  Implementing The Comandos Architecture , 1988 .

[3]  Wolfgang Graetsch,et al.  Fault tolerance under UNIX , 1989, TOCS.

[4]  Brendan Tangney,et al.  Failure and its Recovery in an Object-Oriented Distributed System , 1991 .

[5]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[6]  Jason Gait,et al.  The Optical File Cabinet: a random-access file system for write-once optical disks , 1988, Computer.

[7]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[8]  H ThomasRobert A Majority consensus approach to concurrency control for multiple copy databases , 1979 .

[9]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[10]  John F. Shoch,et al.  The “worm” programs—early experience with a distributed computation , 1982, CACM.

[11]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[12]  George P. Copeland,et al.  What if mass storage were free? , 1982, Computer Architecture Workshop.

[13]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[14]  David R. Cheriton,et al.  The V distributed system , 1988, CACM.

[15]  Brendan Tangney,et al.  Scrabble-a distributed application with an emphasis on continuity , 1990, Softw. Eng. J..

[16]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[17]  F ShochJohn,et al.  The worm programsearly experience with a distributed computation , 1982 .

[18]  Fred Douglis,et al.  Beating the I/O bottleneck: a case for log-structured file systems , 1989, OPSR.

[19]  Henri E. Bal,et al.  A Distributed Implementation of the Shared Data-object Model , 1989 .