论文信息 - Reaching approximate agreement with multiple fault-modes

Reaching approximate agreement with multiple fault-modes

A distributed system is a collection of autonomous processors which communicate with each other via exchanging messages. Designing reliable distributed systems has always been a major problem specially in critical applications, such as flight control systems, where the cost of system malfunction is very significant. Unfortunately, with finite amount of hardware components it is impossible to design a distributed system which never fails. The best we can hope for is to design a reliable distributed system which will operate properly under its design specification as long as the number of processor failures does not exceed a certain limit. With this criterion in mind, one important issue in distributed computing systems is distributed agreement. One form of distributed agreement is Approximate Agreement in which by executing a voting algorithm non-faulty processors exchange their values. On receipt of a collection of values, each non-faulty processor attempts to vote on a value which is within the range of the correct values received. The objective is to exchange the voted values and vote on them until the voted values are within a specified tolerance $\epsilon$ of each other. In recent work, we analyzed the convergence rate and fault-tolerance for a broad family of convergent voting algorithms called Mean-Subsequence-Reduced (MSR) algorithms. The analysis was done for simultaneous presence of three failure modes: asymmetric, symmetric, and benign. The results were simple expressions that can easily determine the performance of any member of MSR algorithms. In this research, we have extended our analysis to include omission faults, which may be the most common type of faults in large geographically distributed systems. More specifically, we have analyzed the convergence rate and fault-tolerance of MSR voting algorithms for a four-mode fault model comprised of omission, asymmetric, symmetric, and benign faults. The results are similar expressions to that of the three-mode fault model. It will be shown that processors are no longer required, as opposed to previous studies, to carry the same number of data items before voting. This flexibility allows processors to locally filter out self-evident errors, and thus global diagnosis of benign errors becomes superfluous.

Mohammad Hassan Azadmanesh