System Dependability

The paper starts with a brief account of how and why, at about the time of the birth of what is now INRIA, the author and his colleagues became interested in the subject now known as system dependability. The main body of the paper summarizes the work over the last three years of the ESPRIT Basic Research project on Predictably Dependable Computing Systems (PDCS). This is a long term collaborative research activity, centred on the problems (i) of producing quantitative methods for measuring and predicting the dependability of complex software/hardware systems, (ii) of incorporating such methods into the design process, and (iii) of developing appropriate architectures and components as bases for designing predictably dependable systems. A further section of the paper then describes, in somewhat more detail, one of the current activities within PDCS. This is work being carried out by the author in collaboration with an INRIA colleague, Dr. Jean-Charles Fabre, on a unified approach to providing both reliability and security termed ObjectOriented Fragmented Data Processing (OOFDP).

[1]  Pierre Semal,et al.  Computable Bounds for Conditional Steady-State Probabilities in Large Markov Chains and Queueing Models , 1986, IEEE J. Sel. Areas Commun..

[2]  Jean Arlat,et al.  Fault injection for dependability validation of fault-tolerant computing systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  Yves Deswarte,et al.  An Intrusion-Tolerant Security Server for an Open Distributed System , 1990, ESORICS.

[4]  Hermann Kopetz,et al.  Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System , 1991 .

[5]  K. H. Kim,et al.  Temporal uncertainties in interactions among real-time objects , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[6]  Özalp Babaoglu,et al.  Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[7]  Yves Deswarte,et al.  Intrusion tolerance in distributed computing systems , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[8]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[9]  Flaviu Cristian,et al.  Agreeing on who is present and who is absent in a synchronous distributed system , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Jean Arlat,et al.  Reliability growth of fault-tolerant software , 1993 .

[11]  Alan C. Shaw,et al.  Reasoning About Time in Higher-Level Language Software , 1989, IEEE Trans. Software Eng..

[12]  Brian Randell,et al.  Fault and intrusion tolerance in object-oriented systems , 1991, Proceedings 1991 International Workshop on Object Orientation in Operating Systems.

[13]  Jean-Charles Fabre,et al.  Fragmented Data Processing: An Approach to Secure and Reliable Processing in Distributed Computing Systems , 1991 .

[14]  Paul D. Ezhilchelvan,et al.  A Performance Evaluation Study of Pipeline TMR Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[15]  Sarah Brocklehurst,et al.  Recalibrating Software Reliability Models , 1990, IEEE Trans. Software Eng..

[16]  Debasis Mitra,et al.  Asymptotic Optimality of the Go-Back-n Protocol In High Speed Data Newworks With Small Buffers , 1991 .

[17]  Ram Chakka,et al.  Multiprocessor Systems with General Breakdowns and Repairs , 1992, SIGMETRICS.

[18]  Santosh K. Shrivastava,et al.  An overview of the Arjuna distributed programming system , 1991, IEEE Software.

[19]  Pierre Semal,et al.  Bounds for the Positive Eigenvectors of Nonnegative Matrices and for their Approximations by Decomposition , 1984, JACM.

[20]  Brian Randell,et al.  An Object-Oriented View of Fragmented Data Processing for Fault and Intrusion Tolerance in Distributed Systems , 1992, ESORICS.

[21]  Lorenzo Strigini,et al.  Adjudicators for diverse-redundant components , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[22]  Andrea Bondavalli,et al.  Failure classification with respect to detection , 1990, [1990] Proceedings. Second IEEE Workshop on Future Trends of Distributed Computing Systems.

[23]  Pascale Thévenod-Fosse Software Validation by Means of Statistical Testing: Retrospect and Future Direction , 1991 .

[24]  Bertrand Meyer,et al.  Eiffel: programming for reusability and extendibility , 1987, SIGP.

[25]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[26]  Marc Shapiro,et al.  SOS: An Object-Oriented Operating System - Assessment and Perspectives , 1989, Comput. Syst..

[27]  Kishor S. Trivedi,et al.  A Decomposition Approach for Stochastic Reward Net Models , 1993, Perform. Evaluation.

[28]  Kishor S. Trivedi,et al.  An Aggregation Technique for the Transient Analysis of Stiff Markov Chains , 1986, IEEE Transactions on Computers.

[29]  Mesaac Makpangou,et al.  Structuring distributed applications as fragmented objects , 1991 .

[30]  Jean Arlat,et al.  Experimental evaluation of the fault tolerance of an atomic multicast system , 1990 .

[31]  David Lorge Parnas,et al.  Evaluation of safety-critical software , 1990, CACM.

[32]  Yves Crouzet,et al.  An experimental study on software structural testing: deterministic versus random input generation , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[33]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[34]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[35]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[36]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[37]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[38]  Yinong Chen,et al.  Evaluation of deterministic fault injection for fault-tolerant protocol testing , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[39]  Hany H. Ammar,et al.  Time Scale Decomposition of a Class of Generalized Stochastic Petri Net Models , 1989, IEEE Trans. Software Eng..

[40]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[41]  Yves Deswarte,et al.  Intrusion-Tolerance Using Fine-Grain Fragmentation-Scattering , 1986, 1986 IEEE Symposium on Security and Privacy.

[42]  Henri E. Bal,et al.  Distributed programming with shared data , 1988, Proceedings. 1988 International Conference on Computer Languages.

[43]  Ram Chakka,et al.  Multiprocessor systems with general breakdowns and repairs (extended abstract) , 1992, SIGMETRICS '92/PERFORMANCE '92.

[44]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[45]  Chin-Kuei Cho Quality programming: Developing and testing software with statistical quality control , 1987 .

[46]  Alexander D. Stoyen,et al.  Real-Time Euclid: A language for reliable real-time systems , 1989, IEEE Transactions on Software Engineering.

[47]  Hélène Waeselynck,et al.  An investigation of statistical software testing , 1991, Softw. Test. Verification Reliab..

[48]  Ravishankar K. Iyer,et al.  FOCUS: An Experimental Environment for Fault Sensitivity Analysis , 1992, IEEE Trans. Computers.

[49]  Alan Burns,et al.  On the Meaning of Safety and Security , 1992, Comput. J..

[50]  Lorenzo Alvisi,et al.  Paralex: an environment for parallel programming in distributed systems , 1991, ICS '92.

[51]  Jean-Claude Laprie,et al.  The KAT (Knowledge-Action-Transformation) Approach to the Modeling and Evaluation of Reliability and Availability Growth , 1991, IEEE Trans. Software Eng..

[52]  Brian Randell,et al.  FDP Techniques in Object-oriented Systems , 1991 .

[53]  Pierre Semal,et al.  Bounds for Transient Characteristics of Markov Chains with Large or Infinite State Spaces , 1990 .

[54]  Lorenzo Strigini,et al.  Flexible schemes for application-level fault tolerance , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.