Experiences, strategies, and challenges in building fault-tolerant CORBA systems

It has been almost a decade since the earliest reliable CORBA implementation and, despite the adoption of the fault-tolerant CORBA (FT-CORBA) standard by the Object Management Group, CORBA is still not considered the preferred platform for building dependable distributed applications. Among the obstacles to FT-CORBA's widespread deployment are the complexity of the new standard, the lack of understanding in implementing and deploying reliable CORBA applications, and the fact that current FT-CORBA do not lend themselves readily to complex, real-world applications. We candidly share our independent experiences as developers of two distinct reliable CORBA infrastructures (OGS and Eternal) and as contributors to the FT-CORBA standardization process. Our objective is to reveal the intricacies, challenges, and strategies in developing fault-tolerant CORBA systems, including our own. Starting with an overview of the new FT-CORBA standard, we discuss its limitations, along with techniques for best exploiting it. We reflect on the difficulties that we have encountered in building dependable CORBA systems, the solutions that we developed to address these challenges, and the lessons that we learned. Finally, we highlight some of the open issues, such as nondeterminism and partitioning, that remain to be resolved.

[1]  Priya Narasimhan,et al.  Gateways for Accessing Fault Tolerance Domains , 2000, Middleware.

[2]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[3]  Robbert van Renesse,et al.  Building Adaptive Systems Using Ensemble , 1998, Softw. Pract. Exp..

[4]  Pascal Felber,et al.  THE CORBA OBJECT GROUP SERVICE: A SERVICE APPROACH TO OBJECT GROUPS IN CORBA , 1998 .

[5]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[6]  Priya Narasimhan,et al.  Decentralized Resource Management and Fault-Tolerance for Distributed CORBA Applications , 2003, 2003 The Ninth IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[7]  Priya Narasimhan,et al.  Consistent Object Replication in the external System , 1998, Theory Pract. Object Syst..

[8]  Priya Narasimhan,et al.  Providing support for survivable CORBA applications with the Immune system , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[9]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[10]  Priya Narasimhan,et al.  Transparent fault tolerance for corba , 1999 .

[11]  Santosh K. Shrivastava,et al.  The Design and Implementation of Arjuna , 1995, Comput. Syst..

[12]  Pascal Felber Lightweight Fault Tolerance in CORBA , 2001, DOA.

[13]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[14]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[15]  Jean-Charles Fabre,et al.  A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach , 1998, IEEE Trans. Computers.

[16]  Rachid Guerraoui,et al.  The Implementation of a CORBA Object Group Service , 1998, Theory Pract. Object Syst..

[17]  Anand R. Tripathi,et al.  RPC-level support for object-oriented distributed programming , 1992, EW 5.

[18]  Paul D. Ezhilchelvan,et al.  Design and implemantation of a CORBA fault-tolerant object group service , 1999, DAIS.

[19]  William H. Sanders,et al.  AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[20]  Santosh K. Shrivastava,et al.  An overview of the Arjuna distributed programming system , 1991, IEEE Software.

[21]  Roberto Baldoni,et al.  An Interoperable Replication Logic for CORBA systems , 2000, Proceedings DOA'00. International Symposium on Distributed Objects and Applications.

[22]  Louise E. Moser,et al.  Totem: a fault-tolerant multicast group communication system , 1996, CACM.

[23]  Rachid Guerraoui,et al.  System support for object groups , 1998, OOPSLA '98.

[24]  Brijbhushan Shrikant Sabnis Proteus: A Software Infrastructure Providing Dependability for CORBA Applications , 1999 .

[25]  Patrick Th. Eugster,et al.  Replicating CORBA objects: a marriage between active and passive replication , 1999, DAIS.

[26]  Priya Narasimhan,et al.  Reconciling Replication and Transactions for the End-to-End Reliability of CORBA Applications , 2002, CoopIS/DOA/ODBASE.

[27]  Priya Narasimhan,et al.  Eternal—a component‐based framework for transparent fault‐tolerant CORBA , 2002, Softw. Pract. Exp..

[28]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[29]  Willy Zwaenepoel,et al.  Distributed process groups in the V Kernel , 1985, TOCS.

[30]  Roy Friedman,et al.  FTS: a high-performance CORBA fault-tolerance service , 2002, Proceedings of the Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. (WORDS 2002).

[31]  Robbert van Renesse,et al.  Building adaptive systems using ensemble , 1998 .

[32]  Robbert van Renesse,et al.  Six misconceptions about reliable distributed computing , 1998, EW 8.

[33]  H. Higaki,et al.  Fault-Tolerant Object by Group-to-Group Communications in Distributed Systems , 1993 .

[34]  Rajeev Rastogi,et al.  Using semantic knowledge of distributed objects to increase reliability and availability , 2001, Proceedings Sixth International Workshop on Object-Oriented Real-Time Dependable Systems.

[35]  Kenneth P. Birman,et al.  The Maestro Approach to Building Reliable Interoperable Distributed Applications with Multiple Execution Styles , 1998, Theory Pract. Object Syst..

[36]  Aniruddha S. Gokhale,et al.  DOORS: towards high-performance fault tolerant CORBA , 2000, Proceedings DOA'00. International Symposium on Distributed Objects and Applications.