Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems

Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.

[1]  Paulo Veríssimo,et al.  The Delta-4 extra performance architecture (XPA) , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[2]  John C. Mitchell,et al.  Concepts in programming languages , 2002 .

[3]  Thomas Wolf,et al.  Replication of non-deterministic objects , 1998 .

[4]  Christof Fetzer,et al.  Tapping TCP streams , 2001, Proceedings IEEE International Symposium on Network Computing and Applications. NCA 2001.

[5]  Thomas C. Bressoud,et al.  TFT: a software system for application-transparent fault tolerance , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[6]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[7]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[8]  Priya Narasimhan,et al.  Enforcing determinism for the consistent replication of multithreaded CORBA applications , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[9]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[10]  E. N. Elnozahy,et al.  Support for Software Interrupts in Log-Based Rollback-Recovery , 1998, IEEE Trans. Computers.

[11]  Alan Burns,et al.  Replica Determinism and Flexible Scheduling in Hard Real-Time Dependable Systems , 2000, IEEE Trans. Computers.

[12]  Priya Narasimhan,et al.  An Architecture for Versatile Dependability , 2004 .

[13]  Keith Marzullo,et al.  Highly-available services using the primary-backup approach , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[14]  Ravishankar K. Iyer,et al.  Active replication of multithreaded applications , 2006, IEEE Transactions on Parallel and Distributed Systems.

[15]  Michael J. Maher,et al.  Replay, recovery, replication, and snapshots of nondeterministic concurrent programs , 1991, PODC '91.

[16]  Ravishankar K. Iyer,et al.  A preemptive deterministic scheduling algorithm for multithreaded replicas , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[17]  Jan Gustafsson,et al.  Worst-case execution-time analysis for embedded real-time systems , 2003, International Journal on Software Tools for Technology Transfer.

[18]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[19]  Rachid Guerraoui,et al.  X-Ability: a theory of replication , 2000, PODC '00.

[20]  Xavier Défago,et al.  Semi-passive replication , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[21]  Ricardo Jiménez-Peris,et al.  Deterministic scheduling for transactional multithreaded replicas , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[22]  Universitd de Nantes ONE SOLUTION FOR THE NON-DETERMINISM PROBLEM IN THE SCEPTRE 2 FAULT TOLERANCE TECHNIQUE , 1995 .