Reaching Fault Diagnosis Agreement under a Hybrid Fault Model

ÐThe goal of the fault diagnosis agreement (FDA) problem is to make each fault-free processor detect/locate a common set of faulty processors. The problem is examined on processors with mixed fault model (also referred to as hybrid fault model). An evidence-based fault diagnosis protocol is proposed to solve the FDA problem. The proposed protocol first collects the messages which have accumulated in the Byzantine agreement protocol as the evidence. By examining the collected evidence, a fault-free processor can detect/locate which processor is faulty. Then, the network can be reconfigured by removing the detected faulty processors and the links connected to these processors from the network. The proposed protocol can detect/locate the maximum number of faulty processors to solve the FDA problem. Index TermsÐByzantine agreement, fault diagnosis agreement, fault-tolerant distributed system, hybrid fault model, mixed fault model.

[1]  Dhiraj K. Pradhan,et al.  Consensus With Dual Failure Modes , 1991, IEEE Trans. Parallel Distributed Syst..

[2]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[3]  Che-Liang Yang,et al.  A Distributed Algorithm for Fault Diagnosis in Systems with Soft Failures , 1988, IEEE Trans. Computers.

[4]  Vaidyanathan Ramaswami,et al.  Analysis of the link error monitoring protocols in the common channel signaling network , 1993, TNET.

[5]  R.L. Pickholtz Telecommunications and the computer , 1978, Proceedings of the IEEE.

[6]  Patrick Lincoln,et al.  A formally verified algorithm for interactive consistency under a hybrid fault model , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[7]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[8]  Gerald M. Masson,et al.  Diagnosable Systems for Intermittent Faults , 1978, IEEE Transactions on Computers.

[9]  Gerald M. Masson,et al.  Diagnosis Without Repair for Hybrid Fault Situations , 1980, IEEE Transactions on Computers.

[10]  Gurdip Singh,et al.  Leader Election in the Presence of Link Failures , 1996, IEEE Trans. Parallel Distributed Syst..

[11]  Wei-Pang Yang,et al.  A Note on Consensus on Dual Failure Modes , 1996, IEEE Trans. Parallel Distributed Syst..

[12]  Hector Garcia-Molina,et al.  Applications of Byzantine agreement in database systems , 1986, TODS.

[13]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[14]  N.R. Malik,et al.  Graph theory with applications to engineering and computer science , 1975, Proceedings of the IEEE.