Optimizing service strategy for systems with deferred repair

The importance of evaluating interval availability and related metrics to modeling computer systems with the deferred repair service strategy has been realized. In systems with deferred repair, services are either triggered when the redundancy falls below a threshold (including the system failure events) or initiated by aperiodic service schedule, and the systems may not enter the steady state within the time interval between two subsequent services. This paper describes an approach that utilizes hierarchical Markov modeling of interval availability, performability, and service cost to optimize the deferred repair service strategy, with the condition to achieve required system availability or performability levels. The time interval between prescheduled periodic services and the redundancy threshold for generating unexpected service calls are the two parameters in the deferred repair service strategy that can be tuned to minimize service cost. Two examples, a wireless Web services system with the availability constraint and a massive, horizontally scaling blade server system with the performability constraint, are presented to illustrate the approach.