A Study of Software Failures and Recovery in the MVS Operating System

This paper describes an analysis of system detected software errors on the MVS operating system at the Center for Information Technology (CIT), Stanford University. The analysis procedure demonstrates a methodology by which systems with automatic recovery features can be evaluated. Most common error categories are determined and related to the program in execution at the time of the error. The severity of the error is measured by evaluating the criticality of the program for continued system operation. The system recovery and error correction features are then analyzed and an estimate of the system fault tolerance to errors of different levels of severity is made.

[1]  Robert L. Glass,et al.  Persistent Software Errors , 1981, IEEE Transactions on Software Engineering.

[2]  Marc A. Auslander,et al.  The Evolution of the MVS Operating System , 1981, IBM J. Res. Dev..

[3]  Albert Endres,et al.  An analysis of errors and their causes in system programs , 1975, IEEE Transactions on Software Engineering.

[4]  John D. Musa,et al.  Measuring reliability of computation center software , 1978, ICSE '78.

[5]  Ravishankar K. Iyer,et al.  A Statistical Failure/Load Relationship: Results of a Multicomputer Study , 1982, IEEE Transactions on Computers.

[6]  F. D. Maxwell,et al.  The determination of measures of software reliability , 1978 .

[7]  Arthur D. Friedman,et al.  Easily Testable Iterative Systems , 1973, IEEE Transactions on Computers.

[8]  Ravishankar K. Iyer,et al.  SOFTWARE RELATED FAILURES ON THE IBM 3081: A RELATIONSHIP WITH SYSTEM UTILIZATION. , 1982 .

[9]  J.D. Musa,et al.  The measurement and management of software reliability , 1980, Proceedings of the IEEE.

[10]  Herbert Hecht Fault-Tolerant Software , 1979, IEEE Transactions on Reliability.