High-available grid services through the use of virtualized clustering

Grid applications comprise several components and web-services that make them highly prone to the occurrence of transient software failures and aging problems. This type of failures often incur in undesired performance levels and unexpected partial crashes. In this paper we present a technique that offers high-availability for Grid services based on concepts like virtualization, clustering and software rejuvenation. To show the effectiveness of our approach, we have conducted some experiments with OGSA-DAI middleware. One of the implementations of OGSA-DAI makes use of use of Apache Axis V1.2.1, a SOAP implementation that suffers from severe memory leaks. Without changing any bit of the middleware layer we have been able to anticipate most of the problems caused by those leaks and to increase the overall availability of the OGSA-DAI Application Server. Although these results are tightly related with this middleware it should be noted that our technique is neutral and can be applied to any other Grid service that is supposed to be high-available.

[1]  Andrea C. Arpaci-Dusseau,et al.  Fail-stutter fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[2]  A. T. Tai,et al.  On-board preventive maintenance: analysis of effectiveness and optimal duty period , 1997, Proceedings Third International Workshop on Object-Oriented Real-Time Dependable Systems.

[3]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[4]  Luís Moura Silva,et al.  Software Aging and Rejuvenation in a SOAP-based Server , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[5]  Daniel A. Menascé,et al.  QoS Issues in Web Services , 2002, IEEE Internet Comput..

[6]  Sébastien Tixeuil,et al.  Benchmarking the OGSA-DAI Middleware , 2006 .

[7]  E Marshall,et al.  Fatal error: how patriot overlooked a scud. , 1992, Science.

[8]  Sébastien Tixeuil,et al.  An Overview of Existing Tools for Fault-Injection and Dependability Benchmarking in Grids , 2006 .

[9]  Kenny C. Gross,et al.  Advanced pattern recognition for detection of complex software aging phenomena in online transaction processing servers , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Renato J. O. Figueiredo,et al.  Guest Editors' Introduction: Resource Virtualization Renaissance , 2005, Computer.

[11]  Kishor S. Trivedi,et al.  Proactive management of software aging , 2001, IBM J. Res. Dev..

[12]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[13]  Elaine J. Weyuker,et al.  Monitoring Smoothly Degrading Systems for Increased Dependability , 2004, Empirical Software Engineering.

[14]  Kishor S. Trivedi,et al.  An approach for estimation of software aging in a Web server , 2002, Proceedings International Symposium on Empirical Software Engineering.