A fault-tolerant protocol for atomic broadcast

A novel general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of 'fail-stop' sites (i.e. no Byzantine site behavior is tolerated). The protocol is fully decentralized and is based on majority-consensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N messages where N is the number of sites. If more than 4N broadcast messages are exchanged in each protocol execution, this protocol achieves better performance than any of the protocols published to date without assuming specific types of site connectivity, clock synchronization, or knowledge of failed sites and failed communication links. Under abnormal operating conditions, a decentralized termination protocol, also presented, is invoked. A performance analysis of this protocol shows that it commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large.<<ETX>>