Impact of hardware and software faults on ARQ schemes-an experimental study

An understanding of the impact of hardware and software faults (HSFs) on any application is critical to the design of efficient software fault tolerance techniques. This paper considers HSFs and, by using a fault injection experiment, studies their impact on an automatic repeat request (ARQ) scheme in terms of throughput degradation. An ARQ scheme is used for error control in computer networks and is implemented in the data link layer. Our study shows that even in the absence of errors in the communication channel the throughput may be degraded with HSFs. Furthermore, we need to identify certain critical variables as locations for fault injection thereby elevating the fault models to a higher level of abstraction. The variables lie in the active path of the program and help accelerate the failure process. This results in fewer runs being needed in conducting the fault injection experiment. Since the accelerated failure process represents the worst case scenario for the fault models considered, the experiences would enable the fault tolerance engineer in designing/choosing fault tolerance mechanisms.

[1]  A. R. K. Sastry Improving ARQ performance on satellite channels under high error rate conditions , 1974 .

[2]  William Stallings,et al.  Data and Computer Communications , 1985 .

[3]  Kang G. Shin,et al.  Software fault injection and its application in distributed systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[4]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[5]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[6]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[7]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[8]  A. Sastry,et al.  Improving Automatic Repeat-Request (ARQ) Performance on Satellite Channels Under High Error Rate Conditions , 1975, IEEE Trans. Commun..

[9]  Robert L. Glass,et al.  Persistent Software Errors , 1981, IEEE Transactions on Software Engineering.

[10]  Jaynarayan H. Lala,et al.  Advanced information processing system: Fault injection study and results , 1992 .

[11]  Daniel P. Siewiorek,et al.  Effects of transient gate-level faults on program behavior , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[12]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[13]  Ram Chillarege,et al.  Understanding large system failures-a fault injection experiment , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[14]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  Jacob A. Abraham,et al.  FERRARI: a tool for the validation of system dependability properties , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.