Static Analysis Meets Distributed Fault-Tolerance: Enabling State-Machine Replication with Nondeterminism

Midas is an inter-disciplinary approach to supporting state-machine replication for nondeterministic distributed applications. The approach exploits compile-time static analysis to identify both firsthand and second-hand sources of nondeterminism. Subsequent runtime compensation occurs through either the transfer of nondeterministic checkpoints or the re-execution of inserted code, and restores consistency among replicas before each new client request. The approach avoids the need for lock-step synchronization and leverages application-level insight to address only the nondeterminism that matters. Our preliminary evaluation demonstrates Midas' feasibility and current performance overheads.

[1]  Priya Narasimhan,et al.  Enforcing determinism for the consistent replication of multithreaded CORBA applications , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[2]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[3]  Soraya Bestaoui One solution for the non-determinism problem in the SCEPTRE 2 fault tolerance technique , 1995, Proceedings Seventh Euromicro Workshop on Real-Time Systems.

[4]  Priya Narasimhan,et al.  Living with Nondeterminism in Replicated Middleware Applications , 2006, Middleware.

[5]  Ravishankar K. Iyer,et al.  A preemptive deterministic scheduling algorithm for multithreaded replicas , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[6]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[7]  Roy Friedman,et al.  Transparent fault-tolerant Java virtual machine , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[8]  Harrick M. Vin,et al.  A fault-tolerant java virtual machine , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[10]  Thomas Wolf,et al.  Replication of non-deterministic objects , 1998 .

[11]  Thomas C. Bressoud,et al.  TFT: a software system for application-transparent fault tolerance , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[12]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.