Abstractions for Constructing Dependable Distributed Systems

ions for Constructing Dependable Distributed Systems Shivakant Mishra and Richard D. Schlichting

[1]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[2]  Flaviu Cristian,et al.  Handshake Protocols , 1987, ICDCS.

[3]  David B. Johnson,et al.  Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[4]  Hirokazu Ihara,et al.  Autonomous Decentralized Computer Control Systems , 1984, Computer.

[5]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[6]  Jeannette M. Wing,et al.  Inheritance of synchronization and recovery properties in Avalon/C++ , 1988 .

[7]  Brian Randell,et al.  Operating Systems, An Advanced Course , 1978 .

[8]  Flaviu Cristian,et al.  Fault-tolerance in the advanced automation system , 1990, EW 4.

[9]  Philip A. Bernstein,et al.  Formal Aspects of Serializability in Database Concurrency Control , 1979, IEEE Transactions on Software Engineering.

[10]  Barbara Liskov,et al.  The Argus Language and System , 1984, Advanced Course: Distributed Systems.

[11]  Daniel A. Menascé,et al.  Locking and Deadlock Detection in Distributed Data Bases , 1979, IEEE Transactions on Software Engineering.

[12]  Paulo Veríssimo,et al.  xAMp: a multi-primitive group communications service , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[13]  Hermann Kopetz,et al.  Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System , 1991 .

[14]  Eric C. Cooper Replicated distributed programs , 1985, SOSP '85.

[15]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[16]  Henri E. Bal,et al.  An efficient reliable broadcast protocol , 1989, OPSR.

[17]  Santosh K. Shrivastava,et al.  An overview of the Arjuna distributed programming system , 1991, IEEE Software.

[18]  W. E. Weihl Using transactions in distributed applications , 1990 .

[19]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[20]  HammerMicael,et al.  Reliability mechanisms for SDD-1 , 1980 .

[21]  Parameswaran Ramanathan,et al.  Fault-tolerant clock synchronization in distributed systems , 1990, Computer.

[22]  David L. Russell,et al.  State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.

[23]  Danny Dolev,et al.  Authenticated Algorithms for Byzantine Agreement , 1983, SIAM J. Comput..

[24]  Milan Milenkovic Update synchronization in multiaccess database systems. , 1979 .

[25]  Parameswaran Ramanathan,et al.  Checkpointing and rollback recovery in a distributed system using common time base , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[26]  Shivakant Mishra,et al.  Implementing fault-tolerant replicated objects using Psync , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.

[27]  Philip P. Macri,et al.  Deadlock detection and resolution in a CODASYL based data management system , 1976, SIGMOD '76.

[28]  Danny Dolev,et al.  Fault-tolerant clock synchronization , 1984, PODC '84.

[29]  Larry L. Peterson,et al.  The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..

[30]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[31]  Paulo Veríssimo,et al.  AMp: a highly parallel atomic multicast protocol , 1989, SIGCOMM '89.

[32]  John P. Warne,et al.  A model for interface groups , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[33]  Paulo Veríssimo,et al.  A posteriori agreement for fault-tolerant clock synchronization on broadcast networks , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[34]  Michael O. Rabin,et al.  Randomized byzantine generals , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[35]  S. T. Chanson,et al.  Failure Transparency in Remote Procedure Calls , 1989, IEEE Trans. Computers.

[36]  Bharat K. Bhargava,et al.  Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[37]  P.M. Melliar-Smith,et al.  Fault-tolerant distributed systems based on broadcast communication , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[38]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[39]  Kenneth J. Perry Randomized Byzantine Agreement , 1985, IEEE Transactions on Software Engineering.

[40]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.

[41]  Fred B. Schneider,et al.  Synchronization in Distributed Programs , 1982, TOPL.

[42]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[43]  Paulo Veríssimo Real-time data management with clock-less reliable broadcast protocols , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[44]  Arie Shoshani,et al.  System Deadlocks , 1971, CSUR.

[45]  David B. Johnson,et al.  Sender-Based Message Logging , 1987 .

[46]  Bruce Jay Nelson Remote procedure call , 1981 .

[47]  Virgil D. Gligor,et al.  On Deadlock Detection in Distributed Systems , 1980, IEEE Transactions on Software Engineering.

[48]  Dale Skeen,et al.  A Quorum-Based Commit Protocol , 1982, Berkeley Workshop.

[49]  Santosh K. Shrivastava,et al.  On the Duality of Fault Tolerant System Structures , 1987, Experiences with Distributed Systems.

[50]  Daniel J. Rosenkrantz,et al.  System level concurrency control for distributed database systems , 1978, TODS.

[51]  Nancy A. Lynch,et al.  A new fault-tolerant algorithm for clock synchronization , 1984, PODC '84.

[52]  Peter G. Neumann Illustrative risks to the public in the use of computer systems and related technology , 1992, SOEN.

[53]  Miroslaw Malek,et al.  The consensus problem in fault-tolerant computing , 1993, CSUR.

[54]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[55]  Andrea J. Borr Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing , 1981, VLDB.

[56]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[57]  Marco A. Casanova,et al.  The Concurrency Control Problem for Database Systems , 1981, Lecture Notes in Computer Science.

[58]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[59]  Michael Burrows,et al.  Performance of Firefly RPC , 1990, ACM Trans. Comput. Syst..

[60]  Hector Garcia-Molina,et al.  Ordered and reliable multicast communication , 1991, TOCS.

[61]  Willem P. de Roever,et al.  A Proof System for Communicating Sequential Processes , 1980, ACM Trans. Program. Lang. Syst..

[62]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[63]  Tony P. Ng,et al.  Replicated transactions , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[64]  Liba Svobodova Resilient Distributed Computing , 1984, IEEE Transactions on Software Engineering.

[65]  R. A. Benel,et al.  Advanced Automation Systems design , 1989 .

[66]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[67]  Fred B. Schneider,et al.  Understanding Protocols for Byzantine Clock Synchronization , 1987 .

[68]  Andrew Birrell,et al.  Implementing Remote procedure calls , 1983, SOSP '83.

[69]  Pui Ng,et al.  A commit protocol for checkpointing transactions , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[70]  Edsger W. Dijkstra,et al.  The structure of the “THE”-multiprogramming system , 1968, CACM.

[71]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[72]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[73]  A. Prasad Sistla,et al.  Efficient distributed recovery using message logging , 1989, PODC '89.

[74]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[75]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[76]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[77]  Butler W. Lampson,et al.  Distributed Systems — Architecture and Implementation , 1982, Lecture Notes in Computer Science.

[78]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[79]  Shivakant Mishra,et al.  Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..

[80]  John B. Goodenough,et al.  Exception handling: issues and a proposed notation , 1975, CACM.

[81]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[82]  Rogério de Lemos,et al.  A robust group membership algorithm for distributed real-time systems , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[83]  Insup Lee,et al.  A protocol for timed atomic commitment , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[84]  Christos H. Papadimitriou,et al.  The serializability of concurrent database updates , 1979, JACM.

[85]  Narain H. Gehani,et al.  Fault tolerant concurrent C: a tool for writing fault tolerant distributed programs , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[86]  Bharat K. Bhargava,et al.  A model for concurrent checkpointing and recovery using transactions , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[87]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[88]  Richard J. LeBlanc,et al.  System Programming with Objects and Actions , 1985, ICDCS.

[89]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[90]  Flaviu Cristian,et al.  A low-cost atomic commit protocol , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[91]  Flaviu Cristian,et al.  Correct and Robust Programs , 1984, IEEE Transactions on Software Engineering.

[92]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[93]  Samuel T. Chanson,et al.  Reliable group communication in distributed systems , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[94]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[95]  R. Shapiro,et al.  Reliability and fault recovery in distributed processing , 1977 .

[96]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[97]  Hermann Kopetz,et al.  Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[98]  Richard D. Schlichting,et al.  Using message passing for distributed programming: proof rules and disciplines , 1984, TOPL.

[99]  Kwei-Jay Lin,et al.  Atomic Remote Procedure Call , 1985, IEEE Transactions on Software Engineering.

[100]  Santosh K. Shrivastava,et al.  Rajdoot: A Remote Procedure Call Mechanism Supporting Orphan Detection and Killing , 1988, IEEE Trans. Software Eng..

[101]  Kang G. Shin,et al.  Evaluation of Error Recovery Blocks Used for Cooperating Processes , 1984, IEEE Transactions on Software Engineering.

[102]  Dale Skeen,et al.  Nonblocking commit protocols , 1981, SIGMOD '81.

[103]  Sam Toueg,et al.  Optimal clock synchronization , 1985, PODC '85.

[104]  Walter H. Kohler,et al.  A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems , 1981, CSUR.

[105]  Philip A. Bernstein,et al.  Analyzing Concurrency Control Algorithms When User and System Operations Differ , 1983, IEEE Transactions on Software Engineering.

[106]  Kenneth P. Birman,et al.  Using process groups to implement failure detection in asynchronous environments , 1991, PODC '91.

[107]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[108]  Richard C. Holt,et al.  Some Deadlock Properties of Computer Systems , 1972, CSUR.

[109]  Kenneth P. Birman,et al.  The ISIS project: real experience with a fault tolerant programming system , 1990, EW 4.

[110]  Danny Dolev,et al.  On the possibility and impossibility of achieving clock synchronization , 1984, STOC '84.

[111]  Jim Gray,et al.  An approach to decentralized computer systems , 1986, IEEE Transactions on Software Engineering.

[112]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[113]  Jean-Charles Fabre,et al.  Some Fault-Tolerant Aspects of the Chorus Distributed System , 1985, ICDCS.

[114]  Paulo Veríssimo,et al.  Reliable broadcast for fault-tolerance on local computer networks , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[115]  Yuval Tamir,et al.  ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .

[116]  Brian A. Coan,et al.  A Simple and Efficient Randomized Byzantine Agreement Algorithm , 1985, IEEE Transactions on Software Engineering.

[117]  Vassos Hadzilacos An algorithm for minimizing roll back cost , 1982, PODS '82.

[118]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[119]  Barbara Liskov,et al.  Distributed programming in Argus , 1988, CACM.

[120]  Shivakant Mishra,et al.  A Membership Protocol Based on Partial Order , 1992 .

[121]  Philip A. Bernstein,et al.  Concurrency control in a system for distributed databases (SDD-1) , 1980, TODS.

[122]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[123]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[124]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[125]  Paulo Veríssimo,et al.  The Delta-4 approach to dependability in open distributed computing systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[126]  Thanasis Hadzilacos,et al.  Deleting completed transactions , 1985, PODS '86.

[127]  Dushan Z. Badal Correctness of concurrency control and implications in distributed databases , 1979, COMPSAC.

[128]  Flaviu Cristian,et al.  Agreeing on who is present and who is absent in a synchronous distributed system , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.