论文信息 - Definition and analysis of hardware- and software-fault-tolerant architectures

Definition and analysis of hardware- and software-fault-tolerant architectures

A structured definition of hardware- and software-fault-tolerant architectures is presented. Software-fault-tolerance methods are discussed, resulting in definitions for soft and solid faults. A soft software fault has a negligible likelihood or recurrence and is recoverable, whereas a solid software fault is recurrent under normal operations or cannot be recovered. A set of hardware- and software-fault-tolerant architectures is presented, and three of them are analyzed and evaluated. Architectures tolerating a single fault and architectures tolerating two consecutive faults are discussed separately. A sidebar addresses the cost issues related to software fault tolerance. The approach taken throughout is as general as possible, dealing with specific classes of faults or techniques only when necessary.<<ETX>>

[1] S. Yau,et al. Design of self-checking software , 1975, Reliable Software.

[2] Atul Prakash,et al. Software Engineering: Problems and Perspectives , 1984, Computer.

[3] Algirdas Avizienis,et al. The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[4] Brian Randell. Design Fault Tolerance , 1986 .

[5] Jim Gray,et al. Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[6] Jaynarayan H. Lala,et al. Hardware and software fault tolerance: a unified architectural approach , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[7] K. H. Kim,et al. Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications , 1989, IEEE Trans. Computers.

[8] Jean Arlat,et al. Hardware- and Software-Fault Tolerance , 1990 .