Reflections on industry trends and experimental research in dependability

Experimental research in dependability has evolved over the past 30 years accompanied by dramatic changes in the computing industry. To understand the magnitude and nature of this evolution, this paper analyzes industrial trends, namely: 1) shifting error sources, 2) explosive complexity, and 3) global volume. Under each-of these trends, the paper explores research technologies that are applicable either to the finished product or artifact, and the processes that are used to produce products. The study gives a framework to not only reflect on the research of the past, but also project the needs of the future.

[1]  Ram Chillarege,et al.  Defect type and its impact on the growth curve (software development) , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[2]  Brendan Murphy,et al.  Reliability growth in software products , 2004, 15th International Symposium on Software Reliability Engineering.

[3]  Ram Chillarege,et al.  Early warning of failures through alarm analysis a case study in telecom voice mail systems , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[4]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[5]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[6]  David Powell,et al.  Failure mode assumptions and assumption coverage , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[7]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[8]  Johan Karlsson,et al.  Fault injection into VHDL models: the MEFISTO tool , 1994 .

[9]  Amrit L. Goel,et al.  Software Reliability Models: Assumptions, Limitations, and Applicability , 1985, IEEE Transactions on Software Engineering.

[10]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[11]  Federico Faccio,et al.  Single event effects in static and dynamic registers in a 0.25 /spl mu/m CMOS technology , 1999 .

[12]  Sylvain Metge,et al.  SoRel: A tool for reliability growth analysis and prediction from statistical failure data , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[13]  Ravishankar K. Iyer,et al.  DEPEND: A Simulation-Based Environment for System Level Dependability Analysis , 1997, IEEE Trans. Computers.

[14]  Ravishankar K. Iyer,et al.  NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[15]  Anthony Hall,et al.  Seven myths of formal methods , 1990, IEEE Software.

[16]  Jean Arlat,et al.  Wrapping Real-Time Systems from Temporal Logic Specifications , 2002, EDCC.

[17]  Alan P. Wood,et al.  Software Reliability from the Customer View , 2003, Computer.

[18]  Ravishankar K. Iyer,et al.  Modeling and evaluating the security threats of transient errors in firewall software , 2004, Perform. Evaluation.

[19]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[20]  David Harel,et al.  Modeling Reactive Systems With Statecharts : The Statemate Approach , 1998 .

[21]  Jean Arlat,et al.  Fault injection for dependability validation of fault-tolerant computing systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[22]  Alexander Romanovsky,et al.  Protective Wrapper Development: A Case Study , 2003, ICCBSS.

[23]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[24]  Elaine J. Weyuker,et al.  Theories of Program Testing and the Application of Revealing Subdomains , 1980, IEEE Transactions on Software Engineering.

[25]  Ram Chillarege,et al.  IBM's ES/9000 Model 982's fault-tolerant design for consolidation , 1994, IEEE Micro.

[26]  Jaynarayan H. Lala Fault detection, isolation and reconfiguration ff fimp: methods and experimental results , 1983 .

[27]  Daniel P. Siewiorek,et al.  A Performance-Reliability Model for Computing Systems, , 1980 .

[28]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[29]  Joseph Sifakis,et al.  Specification and verification of concurrent systems in CESAR , 1982, Symposium on Programming.

[30]  Richard G. Hamlet Introduction to special section on software testing , 1988, CACM.

[31]  Heinz Kantz,et al.  The ELEKTRA railway signalling system: field experience with an actively replicated system with diversity , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[32]  Lisa Spainhower,et al.  IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM J. Res. Dev..

[33]  Mladen A. Vouk Software Reliability Engineering , 1999 .

[34]  LaprieJean-Claude,et al.  Definition and Analysis of Hardware- and Software-Fault-Tolerant Architectures , 1990 .

[35]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[36]  Daniel P. Siewiorek,et al.  Workload, Performance, and Reliability of Digital Computing Systems. , 1980 .

[37]  Wilfrido Alejandro Moreno,et al.  A technique for automated validation of fault tolerant designs using laser fault injection (LFI) , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[38]  Jonathan P. Bowen,et al.  Seven More Myths of Formal Methods , 1994, FME.

[39]  Ravishankar K. Iyer,et al.  Effect of System Workload on Operating System Reliability: A Study on IBM 3081 , 1985, IEEE Transactions on Software Engineering.

[40]  D.P. Siewiorek,et al.  A case study of C.mmp, Cm*, and C.vmp: Part II—Predicting and calibrating reliability of multiprocessor systems , 1978, Proceedings of the IEEE.

[41]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[42]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[43]  Noah Treuhaft,et al.  Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .

[44]  William E. Howden,et al.  Functional program testing and analysis , 1986 .

[45]  Johan Karlsson,et al.  Comparison of Physical and Software-Implemented Fault Injection Techniques , 2003, IEEE Trans. Computers.

[46]  Daniel P. Siewiorek,et al.  Error log analysis: statistical modeling and heuristic trend analysis , 1990 .

[47]  Tullio Vardanega,et al.  On the development of fault-tolerant on-board control software and its evaluation by fault injection , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[48]  M. Y. Hsiao,et al.  Reliability, Availability, and Serviceability of IBM Computer Systems: A Quarter Century of Progress , 1981, IBM J. Res. Dev..

[49]  Michael L. Fredman,et al.  The AETG System: An Approach to Testing Based on Combinatiorial Design , 1997, IEEE Trans. Software Eng..

[50]  Ravishankar K. Iyer,et al.  Networked Windows NT system field failure data analysis , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.

[51]  Philip Koopman,et al.  Robust software - no more excuses , 2002, Proceedings International Conference on Dependable Systems and Networks.

[52]  Ann C. Merenda,et al.  Recovery/serviceability system test improvements for the IBM ES/9000 520 based models , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[53]  Ram Chillarege,et al.  Measurement of failure rate in widely distributed software , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[54]  Ravishankar K. Iyer,et al.  Reliability of Internet Hosts: A Case Study from the End User's Perspective , 1999, Comput. Networks.

[55]  John Rushby,et al.  Formal Methods and the Certification of Critical Systems , 2004 .

[56]  Gérard D. Guiho,et al.  SACEM: A fault tolerant system for train speed control , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[57]  Ram Chillarege,et al.  Understanding large system failures-a fault injection experiment , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[58]  John P. McDermott,et al.  R1: A Rule-Based Configurer of Computer Systems , 1982, Artif. Intell..

[59]  R. Chillarege,et al.  What Is Software Failure? , 1996, IEEE Trans. Reliab..

[60]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[61]  Ravishankar K. Iyer,et al.  Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[62]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[63]  Peter A. Barrett,et al.  Software Fault Tolerance: An Evaluation , 1985, IEEE Transactions on Software Engineering.

[64]  Daniel P. Siewiorek,et al.  A Methodology for the Rapid Injection of Transient Hardware Errors , 1996, IEEE Trans. Computers.

[65]  J. Laprie,et al.  FAULT INJECTION FOR DEPENDABILITY VALIDATION OF FAULT-TOLERANT COMPUTING SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[66]  Daniel P. Siewiorek,et al.  An examination of remote access help desk cases , 2003 .

[67]  Hong Zhao,et al.  Stress-Based and Path-Based Fault Injection , 1999, IEEE Trans. Computers.

[68]  Carl E. Landwehr,et al.  A Taxonomy of Computer Program Security Flaws, with Examples , 1993 .

[69]  Kymie M. C. Tan,et al.  Anomaly Detection in Embedded Systems , 2002, IEEE Trans. Computers.

[70]  Leonardo Impagliazzo,et al.  Experimental evaluation of computer-based railway control systems , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[71]  Philip Koopman,et al.  The Exception Handling Effectiveness of POSIX Operating Systems , 2000, IEEE Trans. Software Eng..

[72]  Ravishankar K. Iyer,et al.  Error sensitivity of the Linux kernel executing on PowerPC G4 and Pentium 4 processors , 2004, International Conference on Dependable Systems and Networks, 2004.

[73]  Pascal Traverse Dependability of Digital Computers on Board Airplanes , 1991 .

[74]  John D. Musa,et al.  Software Reliability Engineering , 1998 .

[75]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[76]  Edmund M. Clarke,et al.  Design and Synthesis of Synchronization Skeletons Using Branching-Time Temporal Logic , 1981, Logic of Programs.

[77]  Algirdas Avizienis,et al.  Toward Systematic Design of Fault-Tolerant Systems , 1997, Computer.

[78]  Ravishankar K. Iyer,et al.  A Measurement-Based Model for Workload Dependence of CPU Errors , 1986, IEEE Transactions on Computers.

[79]  Eugene H. Spafford,et al.  Use of A Taxonomy of Security Faults , 1996 .

[80]  Ravishankar K. Iyer,et al.  A STATISTICAL LOAD DEPENDENCY MODEL FOR CPU ERRORS AT SLAC , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[81]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[82]  Cristian Constantinescu,et al.  Validation of the fault/error handling mechanisms of the Teraflops supercomputer , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[83]  Nancy G. Leveson,et al.  Analysis of Faults in an N-Version Software Experiment , 1990, IEEE Trans. Software Eng..

[84]  Madhan Shridhar Phadke,et al.  Quality Engineering Using Robust Design , 1989 .

[85]  Ravishankar K. Iyer,et al.  An experimental study of security vulnerabilities caused by errors , 2001, 2001 International Conference on Dependable Systems and Networks.

[86]  Daniel P. Siewiorek,et al.  WORKLOAD, PERFORMANCE, AND RELlABlLlTY OF DIGITAL COMPUTlNG SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[87]  Yves Crouzet,et al.  Software Statistical Testing , 1995 .

[88]  Lisa Spainhower,et al.  G4: a fault-tolerant CMOS mainframe , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[89]  Mohamed Kaâniche,et al.  Event log based dependability analysis of Windows NT and 2K systems , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..

[90]  E. Normand Single event upset at ground level , 1996 .

[91]  Ravishankar K. Iyer,et al.  Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data , 1990, IEEE Trans. Computers.

[92]  Ashish Tiwari,et al.  Invisible formal methods for embedded control systems , 2003, Proc. IEEE.

[93]  John D. Musa,et al.  Software-Reliability-Engineered Testing , 1996, Computer.

[94]  Mark Butcher,et al.  Improving software testing via ODC: Three case studies , 2002, IBM Syst. J..

[95]  Ravishankar K. Iyer,et al.  Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[96]  Carl E. Landwehr,et al.  A taxonomy of computer program security flaws , 1993, CSUR.

[97]  Ravishankar K. Iyer,et al.  A data-driven finite state machine model for analyzing security vulnerabilities , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[98]  Algirdas Avizienis,et al.  Design of fault-tolerant computers , 1967, AFIPS '67 (Fall).

[99]  Robert B. Miller,et al.  Response time in man-computer conversational transactions , 1899, AFIPS Fall Joint Computing Conference.

[100]  Brendan Murphy,et al.  Windows 2000 Dependability , 2000 .

[101]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[102]  Allan Kuchinsky,et al.  Integrating user-perceived quality into Web server design , 2000, Comput. Networks.

[103]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[104]  Ram Chillarege,et al.  The Marriage of Business Dynamics and Software Engineering , 2002, IEEE Softw..

[105]  Roy A. Maxion,et al.  Eliminating Exception Handling Errors with Dependability Cases: A Comparative, Empirical Study , 2000, IEEE Trans. Software Eng..

[106]  Erland Jonsson,et al.  How to systematically classify computer security intrusions , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[107]  L. Spitzner,et al.  Honeypots: Tracking Hackers , 2002 .

[108]  Frederick F. Sellers,et al.  Error detecting logic for digital computers , 1968 .

[109]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[110]  Jacob A. Abraham,et al.  FERRARI: a tool for the validation of system dependability properties , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[111]  Jean Arlat,et al.  MAFALDA: Microkernel Assessment by Fault Injection and Design Aid , 1999, EDCC.

[112]  Ravishankar K. Iyer,et al.  Analysis and Modeling of Correlated Failures in Multicomputer Systems , 1992, IEEE Trans. Computers.

[113]  Henrique Madeira,et al.  Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers , 1998, IEEE Trans. Software Eng..