A performability-oriented software rejuvenation framework for distributed applications

While inherent resource redundancies in distributed applications facilitate gracefully degradable services, methods to enhance their dependability may have subtle, yet significant, performance implications, especially when such applications are stateful in nature. In this paper, we present a performability-oriented framework that enables the realization of software rejuvenation in stateful distributed applications. The framework is constructed based on three building blocks, namely, a rejuvenation algorithm, a set of performability metrics, and a performability model. We demonstrate via model-based evaluation that this framework enables error-accumulation-prone distributed applications to deliver services at the best possible performance level, even in environments in which a system is highly vulnerable to failures.

[1]  Kishor S. Trivedi,et al.  Analysis of Preventive Maintenance in Transactions Based Software Systems , 1998, IEEE Trans. Computers.

[2]  Kishor S. Trivedi,et al.  Analysis and implementation of software rejuvenation in cluster systems , 2001, SIGMETRICS '01.

[3]  Ward Whitt,et al.  Limits and Approximations for the Busy-Period Distribution in Single-Server Queues , 1995, Probability in the Engineering and Informational Sciences.

[4]  Tadashi Dohi,et al.  Estimating Software Rejuvenation Schedules in High-Assurance Systems , 2001, Comput. J..

[5]  William H. Sanders,et al.  An Adaptive Quality of Service Aware Middleware for Replicated Services , 2003, IEEE Trans. Parallel Distributed Syst..

[6]  Shey-Huei Sheu,et al.  Optimal age-replacement policy with age-dependent minimal-repair and random-leadtime , 2001, IEEE Trans. Reliab..

[7]  Ann T. Tai,et al.  On-Board Preventive Maintenance: A Design-Oriented Analytic Study for Long-Life Applications , 1999, Perform. Evaluation.

[8]  Matteo Sereno,et al.  Fine Grained Software Degradation Models for Optimal Rejuvenation Policies , 2001, Perform. Evaluation.

[9]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .

[10]  Magnus Almgren,et al.  An Architecture for an Adaptive Intrusion-Tolerant Server , 2002, Security Protocols Workshop.

[11]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[12]  M.A. Qureshi,et al.  The UltraSAN Modeling Environment , 1995, Perform. Evaluation.