A new algorithm for increasing fault-tolerance of distributed systems

Existing solutions to improve availability and fault-tolerance of distributed systems suffer from extra added complications and overheads to these systems. Some of these solutions use object replication for achieving their objective and thus entail these overheads to control object replicas. This paper presents a similar solution, but uses a novel mechanism for the pessimistic control of object replicas, namely the primary server approach. The message overhead in this approach is considerably less than related solutions, leading to more improved system fault-tolerance and availability.

[1]  Avishai Wool,et al.  Replication, consistency, and practicality: are these mutually exclusive? , 1998, SIGMOD '98.

[2]  Lorraine Johnston,et al.  Handling multiple domain objects with Model-View-Controller , 1999, Proceedings Technology of Object-Oriented Languages and Systems. TOOLS 32.

[3]  Santosh K. Shrivastava,et al.  An overview of the Arjuna distributed programming system , 1991, IEEE Software.

[4]  Santosh K. Shrivastava,et al.  Using application specific knowledge for configuring object replicas , 1996, Proceedings of International Conference on Configurable Distributed Systems.

[5]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[6]  Bettina Kemme,et al.  Postgres-R(SI): combining replica control with concurrency control based on snapshot isolation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Esther Pacitti,et al.  Update propagation strategies to improve freshness in lazy master replicated databases , 2000, The VLDB Journal.

[8]  Takao Yamashita Dynamic Replica Control Based on Fairly Assigned Variation of Data for Loosely Coupled Distributed Database Systems , 2005, IEICE Trans. Inf. Syst..

[9]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[10]  Atul Prakash,et al.  Concurrency Control and View Notification Algorithms for Collaborative Replicated Objects , 1998, IEEE Trans. Computers.

[11]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[12]  Santosh K. Shrivastava,et al.  Object Replication in Arjuna , 1994 .

[13]  Satish K. Tripathi,et al.  An Analysis of the Average Message Overhead in Replica Control Protocols , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[15]  Akhil Kumar,et al.  Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data , 1991, IEEE Trans. Computers.

[16]  Makoto Takizawa,et al.  Object-based protocol for replicated objects , 1999, Proceedings. Fourth International Symposium on Autonomous Decentralized Systems. - Integration of Heterogeneous Systems -.

[17]  Satish K. Tripathi,et al.  A Fault-Tolerant Algorithm for Replicated Data Management , 1995, IEEE Trans. Parallel Distributed Syst..