Analysis of software halts in the tandem GUARDIAN operating system

A systematic methodology is given to investigate the dependability of operational software. The methodology combines several techniques. Time series analysis is used to characterize the occurrence of software failures. Markov reward modeling is used to determine the loss in service due to failures of software components, and to identify major bottlenecks. The effectiveness of built-in fault tolerance is also evaluated. The methodology is illustrated using the software halt data from the Tandem GUARDIAN operating system. The results show that the occurrences of software halts are not correlated with each other in time. Interrupt a handling and memory management are found to be the major bottlenecks in the measured system. The fault tolerance in the measured system was shown to reduce the service loss by nearly 90%.<<ETX>>

[1]  Ravishankar K. Iyer,et al.  Analysis of the VAX/VMS error logs in multicomputer environments-a case study of software dependability , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[2]  Ravishankar K. Iyer,et al.  Error/failure analysis using event logs from fault tolerant systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[3]  Mei-Chen Hsueh,et al.  A measurement-based model of software reliability in a production environment , 1987 .

[4]  Kishor S. Trivedi,et al.  Composite Performance and Dependability Analysis , 1992, Perform. Evaluation.

[5]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[6]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[7]  Ravishankar K. Iyer,et al.  Effect of System Workload on Operating System Reliability: A Study on IBM 3081 , 1985, IEEE Transactions on Software Engineering.

[8]  Bev Littlewood,et al.  Theories of Software Reliability: How Good Are They and How Can They Be Improved? , 1980, IEEE Transactions on Software Engineering.

[9]  Steve Smoliar Two books named "Software reliability": review of "Software reliability" by Thomas A. Thayer, Myron Lipow, Eldred C. Nelson. North-Holland 1978. and "Software relibaility" by Hermann Kopetz. Springer-Verlag 1980. , 1981, SOEN.