Towards context-aware adaptive fault tolerance in SOA applications

Software components are expected to exhibit highly dependable characteristics in mission-critical applications, particularly in the areas of reliability and timeliness. Redundancy-based fault-tolerant strategies have long been used as a means to avoid a disruption in the service provided by the system in spite of the occurrence of failures in the underlying components. Adopting these fault-tolerance strategies in highly dynamic distributed computing systems, in which components often suffer from long response times or temporary unavailability, does not necessarily result in the anticipated improvement in dependability. In fact, as these dependability strategies are usually statically predefined and immutable, a change in the operational status (context) of any of the components involved may very well jeopardise the schemes' overall effectiveness. In this paper, a novel dependability strategy is introduced supporting advanced redundancy management, aiming to autonomously tune its internal configuration in view of changes in context. It is apparent from our preliminary experimentation that this strategy can effectively achieve an optimal trade-off between service reliability and performance-related factors such as timeliness and the degree of redundancy employed. A prototypical service-oriented implementation of the proposed adaptive fault tolerant strategy is presented thereafter, leveraging WS-* specifications to gather and disseminate contextual information.

[1]  Albert Y. Zomaya,et al.  Dependable computing systems : paradigms, performance issues, and applications , 2005 .

[2]  Paul Müller,et al.  Iterative Service Orchestration based on Dependability Attributes , 2008, EUROMICRO-SEAA.

[3]  Thomas Erl,et al.  SOA Design Patterns , 2008 .

[4]  Vincenzo De Florio,et al.  Application-layer Fault-Tolerance Protocols , 2008, ArXiv.

[5]  Elena Dubrova,et al.  Fault Tolerant Design : An Introduction , 2013 .

[6]  Vincenzo De Florio Software Assumptions Failure Tolerance: Role, Strategies, and Visions , 2009, WADS.

[7]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[8]  Zibin Zheng,et al.  An adaptive QoS-aware fault tolerance strategy for web services , 2010, Empirical Software Engineering.

[9]  Daniel P. Siewiorek,et al.  High-availability computer systems , 1991, Computer.

[10]  Dave E. Eckhardt,et al.  A theoretical investigation of generalized voters for redundant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[11]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[12]  Amit P. Sheth,et al.  Modeling Quality of Service for Workflows and Web Service Processes , 2002 .

[13]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[14]  Nuno Laranjeiro,et al.  Towards fault tolerance in web services compositions , 2007, EFTS '07.

[15]  Daniel Kroening,et al.  Fault tolerance tradeoffs in moving from decentralized to centralized embedded systems , 2004, International Conference on Dependable Systems and Networks, 2004.

[16]  Geert Deconinck,et al.  Software tool combining fault masking with user-defined recovery strategies , 1998, IEE Proc. Softw..

[17]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[18]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .