论文信息 - Optimizing service strategy for systems with deferred repair

Optimizing service strategy for systems with deferred repair

The importance of evaluating interval availability and related metrics to modeling computer systems with the deferred repair service strategy has been realized. In systems with deferred repair, services are either triggered when the redundancy falls below a threshold (including the system failure events) or initiated by aperiodic service schedule, and the systems may not enter the steady state within the time interval between two subsequent services. This paper describes an approach that utilizes hierarchical Markov modeling of interval availability, performability, and service cost to optimize the deferred repair service strategy, with the condition to achieve required system availability or performability levels. The time interval between prescheduled periodic services and the redundancy threshold for generating unexpected service calls are the two parameters in the deferred repair service strategy that can be tuned to minimize service cost. Two examples, a wireless Web services system with the availability constraint and a massive, horizontally scaling blade server system with the performability constraint, are presented to illustrate the approach.

[1] Liang Yin,et al. Hierarchical composition and aggregation of state-based availability and performability models , 2003, IEEE Trans. Reliab..

[2] Kishor S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[3] Kishor S. Trivedi,et al. Hierarchical computation of interval availability and related metrics , 2004, International Conference on Dependable Systems and Networks, 2004.

[4] Christian Huitema,et al. An Architecture for Residential Internet Telephony Service , 1999, IEEE Internet Comput..

[5] John F. Meyer. Performability evaluation: where it is and what lies ahead , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[6] Michael S. Floyd,et al. Fault-tolerant design of the IBM pSeries 690 system using POWER4 processor technology , 2002, IBM J. Res. Dev..

[7] Boudewijn R. Haverkort,et al. Performance and reliability analysis of computer systems: An example-based approach using the sharpe software package , 1998 .

[8] Kishor S. Trivedi,et al. Performance And Reliability Analysis Of Computer Systems (an Example-based Approach Using The Sharpe Software , 1997, IEEE Transactions on Reliability.