A Measurement-Based Reliability/Performability Model for a Multiprocessor System

A semi-Markov model is used in this paper to evaluate the resource-usage/error/recovery process in a large mainframe system. The model is based on low-level error and resource usage data collected on an IBM 3081 system during its normal operation. Both normal and erroneous behavior of the system are modeled. The results provide an understanding of the different types of errors and recovery processes. A sensitivity analysis is performed to investigate the significance of using a semi-Markov process (as opposed to a Markov process) to model the measured system. In addition, a measurement-based performability model based on real error-data collected is proposed. A reward function, based on the service rate and the error rate in each state, is defined in order to estimate the performability of the system and to depict the cost of different error types and recovery procedures.