A comparative analysis of hardware and software fault tolerance: Impact on software reliability engineering

Today's digital systems are growing increasingly complex, and are being used in increasingly critical functions. The first premise makes them more prone to contain faults, and the second premise makes their failure less tolerable. This widening gap highlights the need for fault tolerant techniques, which make provisions for reliable operation of digital systems despite the presence and occasional manifestation of faults. In this paper we present a brief comparative survey of fault tolerance as it arises in hardware systems and software systems. We discuss logical models as well as statistical models of fault tolerance, and use these models to analyze design tradeoffs of fault tolerant systems.

[1]  Paul B. Moranda,et al.  An Error Detection Model for Application During Software Development , 1981, IEEE Transactions on Reliability.

[2]  John C. Munson,et al.  Software metrics in reliability assessment , 1996 .

[3]  Victor L. Winter,et al.  Passive Safety in High-Consequence Systems , 1998 .

[4]  Michael R. Lyu,et al.  System reliability analysis of an N-version programming application , 1993, Proceedings of 1993 IEEE International Symposium on Software Reliability Engineering.

[5]  Ali Mili,et al.  Behavorial Specifications in Object-Oriented Programming , 1995, J. Object Oriented Program..

[6]  Algirdas Avizienis,et al.  Fault-Tolerant Design for VLSI: Effect of Interconnect Requirements on Yield Improvement of VLSI Designs , 1982, IEEE Transactions on Computers.

[7]  MiliAli,et al.  A comparative analysis of hardware and software fault tolerance , 2000 .

[8]  M. Brun,et al.  Critical software for nuclear reactors: 11 years of field experience analysis , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[9]  Elaine J. Weyuker,et al.  An Extended Domain-Bases Model of Software Reliability , 1988, IEEE Trans. Software Eng..

[10]  Peter G. Bishop,et al.  The variation of software survival time for different operational input profiles (or why you can wait a long time for a big bug to fail) , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[11]  Ali Mili,et al.  Introduction to Program Fault Tolerance , 1990 .

[12]  H. Sharangpani,et al.  Statistical Analysis of Floating Point Flaw in the Pentium Processor , 1994 .

[13]  John D. Musa,et al.  Software-Reliability-Engineered Testing , 1996, Computer.

[14]  H. Hecht,et al.  Toward more effective testing for high assurance systems , 1997, Proceedings 1997 High-Assurance Engineering Workshop.

[15]  Kishor S. Trivedi,et al.  Modeling Correlation in Software Recovery Blocks , 1993, IEEE Trans. Software Eng..

[16]  Ann T. Tai,et al.  Long-Life Deep-Space Applications , 1998 .

[17]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[18]  Dave E. Eckhardt,et al.  A theoretical investigation of generalized voters for redundant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[19]  Mladen A. Vouk Software Reliability Engineering , 1999 .

[20]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[21]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[22]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[23]  H. Hecht,et al.  Rare conditions and their effect on software failures , 1994, Proceedings of Annual Reliability and Maintainability Symposium (RAMS).

[24]  John D. Musa,et al.  Software reliability measurement , 1984, J. Syst. Softw..

[25]  John D. Musa,et al.  Software Reliability Engineering , 1998 .

[26]  John D. Musa,et al.  Software reliability: measurement, prediction, application (professional ed.) , 1989 .

[27]  Jean Arlat,et al.  Reliability growth of fault-tolerant software , 1993 .

[28]  Ravishankar K. Iyer,et al.  Analysis of the VAX/VMS error logs in multicomputer environments-a case study of software dependability , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[29]  George B. Finelli Results of software error-data experiments , 1988 .

[30]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[31]  Peter G. Bishop,et al.  PODS revisited-a study of software failure behaviour , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[32]  Daniel P. Siewiorek,et al.  Reliable Computer Systems: Design and Evaluation, Third Edition , 1998 .

[33]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[34]  Shunji Osaki,et al.  Software Reliability Growth Modeling: Models and Applications , 1985, IEEE Transactions on Software Engineering.

[35]  Ravishankar K. Iyer,et al.  An approach towards benchmarking of fault-tolerant commercial systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[36]  A. T. Tai,et al.  On-board maintenance for long-life systems , 1998, Proceedings. 1998 IEEE Workshop on Application-Specific Software Engineering and Technology. ASSET-98 (Cat. No.98EX183).

[37]  William Stallings,et al.  Computer organization and architecture (3rd ed.): principles of structure and function , 1987 .

[38]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[39]  Brian Randell System structure for software fault tolerance , 1975 .

[40]  Ken-ichi Matsumoto,et al.  A mathematical comparison of Software Breeding and Community Error Recovery in multiversion software , 1993, Proceedings of 1993 IEEE International Symposium on Software Reliability Engineering.

[41]  Ravishankar K. Iyer,et al.  DEPEND: a simulation environment for system dependability modeling and evaluation , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.

[42]  Nancy G. Leveson,et al.  The Consistent Comparison Problem in N-Version Software , 1989, IEEE Trans. Software Eng..

[43]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.