In a distributed environment, a client-server model is often used because of its ease of implementation. The reliability of the server on the model determines the reliability of the whole system. By making a backup server, the tolerance of server failure can be improved. This method, however, has a long recovery time because a recovery operation includes generally detecting failure, switching to the backup server, starting the server process, and re-executing services executed at failure. Thus, the method cannot be introduced to application areas that have time constraints. The authors aim to implement the highly reliable and available client-server system which is able to recover failures in a very short time. In order to shorten recovery time, process level replication is employed. A server process has more than one copies on different hosts. Copies keep the same status as the original process. Thus, they can replace the original process very quickly when a failure occurs. Several protocols, such as updating the status of copies and recovering from failures, must be specified in order to implement the proposed system. In this paper, protocols for updating copies and recovering from failures are described formally in Timed CSP, which is a process algebraic language, for defining them precisely.
[1]
Xie Li,et al.
A distributed computing model based on multiserver
,
1996,
OPSR.
[2]
Joel F. Bartlett,et al.
A NonStop kernel
,
1981,
SOSP.
[3]
Jim Davies,et al.
A Brief History of Timed CSP
,
1995,
Theor. Comput. Sci..
[4]
Partha Dasgupta,et al.
CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platforms
,
1995,
Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.
[5]
Andrzej M. Goscinski,et al.
Distributed operating systems - the logical design
,
1991
.