Effect of System Workload on Operating System Reliability: A Study on IBM 3081

This paper presents an analysis of operating system failures on an IBM 3081 running VM/SP. We find three broad categories of software failures: error handling (ERH), program control or logic (CTL), and hardware related (HS); it is found that more than 25 percent of software failures occur in the hardware/software interface. Measurements show that results on software reliability cannot be considered representative unless the system workload is taken into account. For example, it is shown that the risk of a software failure increases in a nonlinear fashion with the amount of interactive processing, as measured by parameters such as the paging rate and the amount of overhead (operating system CPU time). The overall CPU execution rate, although measured to be close to 100 percent most of the time, is not found to correlate strongly with the occurrence of failures. The paper discusses possible reasons for the observed workload failure dependency based on detailed investigations of the failure data.

[1]  Amrit L. Goel,et al.  A Summary of the Discussion on "An Analysis of Competing Software Reliability Models" , 1980, IEEE Transactions on Software Engineering.

[2]  John D. Musa,et al.  Measuring reliability of computation center software , 1978, ICSE '78.

[3]  J.D. Musa,et al.  The measurement and management of software reliability , 1980, Proceedings of the IEEE.

[4]  Ronald G. Askin Handbook for Linear Regression , 1980 .

[5]  J. J. Donovan,et al.  Virtual machine advantages in security, integrity, and decision support systems [Authors' response] , 1976 .

[6]  Paola Velardi,et al.  Hardware-Related Software Errors: Measurement and Analysis , 1985, IEEE Transactions on Software Engineering.

[7]  Bev Littlewood,et al.  Theories of Software Reliability: How Good Are They and How Can They Be Improved? , 1980, IEEE Transactions on Software Engineering.

[8]  B. Curtis,et al.  Measurement and experimentation in software engineering , 1980, Proceedings of the IEEE.

[9]  P.J. Denning,et al.  On learning how to predict , 1980, Proceedings of the IEEE.

[10]  Ravishankar K. Iyer,et al.  A Statistical Failure/Load Relationship: Results of a Multicomputer Study , 1982, IEEE Transactions on Computers.

[11]  Albert Endres An Analysis of Errors and Their Causes in System Programs , 1975, IEEE Trans. Software Eng..

[12]  Herbert Hecht Fault-Tolerant Software , 1979, IEEE Transactions on Reliability.

[13]  Xavier Castillo,et al.  A compatible hardware/software reliability prediction model , 1981 .

[14]  Robert L. Glass,et al.  Persistent Software Errors , 1981, IEEE Transactions on Software Engineering.

[15]  Barry Boehm,et al.  Characteristics of software quality , 1978 .