Evaluating quorum systems over the Internet (brief announcement)

Quorum systems serve as a basic tool providing a uniform and reliable way to achieve coordination in a distributed system. They are useful for distributed and replicated databases, name servers, mutual exclusion, and distributed access control and signatures. Traditionally, two basic methods have been used to evaluate quorum systems: The first is the analytical approach, which computes the optimal quorum system using some stochastic model. Assumptions such as independent failures and perfect communication are usually made to render the problem tractable. The second approach is simulation, in which a simulation model is constructed and run. While the simulation approach allows for more complex models than the analytical approach, it usually makes stronger assumptions, such as failure distribution, mean-time to repair, etc. This paper proposes an empirical approach. We collected 6 months’ worth of connectivity and operabtity data of a system consisting of 14 real computers using a wide area group communication protocol. The system spanned two geographic sites and three different Internet segments. Each computer recorded to a local log file every change in the membership, i.e., in the set of other machines it was currently connected to. Local recoveries and crashes were recorded to the local log as well. Each log record was timestarnped with the 10CSJ time. We developed a mechanism that merges all the local files into a unified history of the events that took place, ordered according to an imaginary global clock. This non-trivial mechanism had to overcome inconsistent local views, unsynchronized clocks and operator errors. We then developed a tool called the Generic Quorumsystem Evaluator (GQE), which evaluates the behavior of any given quorum system over the unified history. The GQE lets us compare the performance of different quorum systems over the same real-life h~tory of events. We compared fourteen dynamic and static quorum systems, some of which

[1]  Yair Amir,et al.  Evaluating quorum systems over the Internet , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.