Deterministic scheduling for transactional multithreaded replicas

One way to implement a fault-tolerant service is by replicating it at sites that fail independently. One of the replication techniques is active replication where each request is executed by all the replicas. Thus, the effects of failures can be completely masked, resulting in an increase of service availability. In order to preserve consistency among replicas, replicas must exhibit a deterministic behavior, which has traditionally been achieved by restricting replicas to being single-threaded. However, this approach cannot be applied in some setups like transactional systems, where it is not admissible to process transactions sequentially. The authors present a deterministic scheduling algorithm for multithreaded replicas in a transactional framework. To ensure replica determinism, requests to replicated servers are submitted by means of reliable and totally ordered multicast. Internally, a deterministic scheduler ensures that all threads are scheduled in the same way at all replicas which guarantees replica consistency.

[1]  Bernd Walter,et al.  Nested Transactions with Multiple Commit Points: An Approach to the Structuring of Advanced Database Applications , 1984, VLDB.

[2]  Amr El Abbadi,et al.  Implementing Fault-Tolerant Distributed Objects , 1985, IEEE Transactions on Software Engineering.

[3]  Kenneth P. Birman,et al.  Building Secure and Reliable Network Applications , 1996 .

[4]  Isabelle Puaut,et al.  Scheduling fault-tolerant distributed hard real-time tasks independently of the replication strategies , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).

[5]  Ricardo Jiménez-Peris,et al.  An Integrated Approach to Transactions and Group Communication , 2000 .

[6]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[7]  Dale Skeen,et al.  Nonblocking commit protocols , 1981, SIGMOD '81.

[8]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[9]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[10]  Gustavo Alonso,et al.  Scalable Replication in Database Clusters , 2000, DISC.

[11]  E. N. Elnozahy,et al.  Supporting nondeterministic execution in fault-tolerant systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[12]  Samuel T. Chanson,et al.  Process groups and group communications: classifications and requirements , 1990, Computer.

[13]  Wolfgang Graetsch,et al.  Fault tolerance under UNIX , 1989, TOCS.

[14]  B SchneiderFred Implementing fault-tolerant services using the state machine approach: a tutorial , 1990 .

[15]  Angel Alvarez,et al.  An Ada Library to Program Fault-Tolerant Distributed Applications , 1997, Ada-Europe.

[16]  William H. Sanders,et al.  AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[17]  Rachid Guerraoui,et al.  System support for object groups , 1998, OOPSLA '98.

[18]  R. Jiménez-Peris,et al.  Deterministic Scheduling and Online Recovery for Replicated Multithreaded Transactional Servers , 2002 .

[19]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[20]  Ricardo Jiménez-Peris,et al.  Synchronizing group transaction with rendezvous in a distributed Ada environment , 1998, SAC '98.

[21]  Ferranti Computer Systems Limited,et al.  THE DELTA-4 EXTRA PERFORMANCE ARCHITECTURE (XPA) , 1990 .

[22]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[23]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[24]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[25]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[26]  Ricardo Jiménez-Peris,et al.  Integrating Groups and Transactions: A Fault-Tolerant Extension of Ada , 1998, Ada-Europe.

[27]  Yair Amir,et al.  Replication using group communication over a partitioned network (שכפול באמצעות תקשרת קבוצות מעל רשת דינמית.) , 1995 .

[28]  D. Powell,et al.  The Delta-4 Approach to Dependability in Open Distributed Computing Systems , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[29]  P. Reynier,et al.  Active replication in Delta-4 , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[30]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[31]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[32]  Cynthia Dwork,et al.  The inherent cost of nonblocking commitment , 1983, PODC '83.

[33]  Dale Skeen,et al.  A Quorum-Based Commit Protocol , 1982, Berkeley Workshop.

[34]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[35]  Alan Burns,et al.  Replica Determinism and Flexible Scheduling in Hard Real-Time Dependable Systems , 2000, IEEE Trans. Computers.

[36]  B. R. Badrinath,et al.  Semantics-based concurrency control: Beyond commutativity , 1987, 1987 IEEE Third International Conference on Data Engineering.

[37]  Sean Landis,et al.  Building Reliable Distributed Systems with CORBA , 1997, Theory Pract. Object Syst..

[38]  Gustavo Alonso,et al.  Non-intrusive, parallel recovery of replicated data , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[39]  Priya Narasimhan,et al.  Enforcing determinism for the consistent replication of multithreaded CORBA applications , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[40]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[41]  Paulo Veríssimo,et al.  The Delta-4 approach to dependability in open distributed computing systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[42]  Idit Keidar,et al.  Increasing the resilience of atomic commit, at no additional cost , 1995, PODS '95.

[43]  David Powell Extra Performance Architecture (XPA) , 1991 .

[44]  Soraya Bestaoui One solution for the non-determinism problem in the SCEPTRE 2 fault tolerance technique , 1995, Proceedings Seventh Euromicro Workshop on Real-Time Systems.