Abstract The main purpose of this paper is to present a design framework for highly available communication systems. Although we consider that the implementation phase of the system is based on a standardized specification, we note that some features for highly available systems are missing in some standardized specifications. We highlight these features and present general guidelines to write a protocol specification of high availability. Concepts in fault-tolerant computing, in formal description techniques for communication protocols, and results in the communication protocol studies are considered a basis for this work. The design framework is composed of five phases, adding a so-called observation phase to the usual four-phase approach of fault tolerant programming. This framework is based on the black box approach, remote observation method, single fault model and formal description techniques; thus some kinds of error cannot be detected due to the inherent limitations of these approaches. Techniques to improve the framework are discussed to overcome the limitations of the black box approach, to predict the occurrence of multiple faults, and to predict the existence of unspecified errors when a formal description technique is used. The design framework is illustrated by examples described using the ESTELLE notation. The examples highlight the advantages of the framework and also point out some of the difficulties encountered — the incorporation of observation points with timing requirements using the ESTELLE notation, the description of the damage assessment phase in a formal way. From these discussions, some requirements on protocol specifications of high availability are derived.
[1]
D. Rayner.
Progress on standardizing OSI conformance testing
,
1986
.
[2]
A.G. Fraser.
Designing a public data network
,
1991,
IEEE Communications Magazine.
[3]
G. Bochmann,et al.
Fault Models in Testing
,
1991,
Protocol Test Systems.
[4]
Thomas I. McVittie,et al.
Implementing design diversity to achieve fault tolerance
,
1991,
IEEE Software.
[5]
Daniel P. Siewiorek,et al.
High-availability computer systems
,
1991,
Computer.
[6]
Q. Rafiq,et al.
The Astride testing approach: principles, tools and carrying out
,
1991
.
[7]
Gregor von Bochmann,et al.
Synchronization and Specification Issues in Protocol Testing
,
1984,
IEEE Trans. Commun..
[8]
Gregor von Bochmann,et al.
Error detection with multiple observers
,
1985,
PSTV.
[9]
Mohamed G. Gouda,et al.
Stabilizing Communication Protocols
,
1991,
IEEE Trans. Computers.
[10]
Ali Mili.
An introduction to program fault tolerance: A structured programming approach
,
1990
.