A design framework for highly available communication systems

Abstract The main purpose of this paper is to present a design framework for highly available communication systems. Although we consider that the implementation phase of the system is based on a standardized specification, we note that some features for highly available systems are missing in some standardized specifications. We highlight these features and present general guidelines to write a protocol specification of high availability. Concepts in fault-tolerant computing, in formal description techniques for communication protocols, and results in the communication protocol studies are considered a basis for this work. The design framework is composed of five phases, adding a so-called observation phase to the usual four-phase approach of fault tolerant programming. This framework is based on the black box approach, remote observation method, single fault model and formal description techniques; thus some kinds of error cannot be detected due to the inherent limitations of these approaches. Techniques to improve the framework are discussed to overcome the limitations of the black box approach, to predict the occurrence of multiple faults, and to predict the existence of unspecified errors when a formal description technique is used. The design framework is illustrated by examples described using the ESTELLE notation. The examples highlight the advantages of the framework and also point out some of the difficulties encountered — the incorporation of observation points with timing requirements using the ESTELLE notation, the description of the damage assessment phase in a formal way. From these discussions, some requirements on protocol specifications of high availability are derived.