A binary Particle Swarm Optimization approach to fault diagnosis in parallel and distributed systems

The efficient diagnosis of hardware and software faults in parallel and distributed systems remains a challenge in today's most prolific decentralized environments. System-level fault diagnosis is concerned with the identification of all faulty components among a set of hundreds (or even thousands) of interconnected units, usually by thoroughly examining a collection of test outcomes carried out by the nodes under a specific test model. This task has non-polynomial complexity and can be posed as a combinatorial optimization problem. Here, we apply a binary version of the Particle Swarm Optimization meta-heuristic approach to solve the system-level fault diagnosis problem (BPSO-FD) under the invalidation and comparison diagnosis models. Our method is computationally simpler than those already published in literature and, according to our empirical results, BPSO-FD quickly and reliably identifies the true ensemble of faulty units and scales well for large parallel and distributed systems.

[1]  Mourad Elhadef,et al.  An evolutionary algorithm for identifying faults in t-diagnosable systems , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[2]  Xin-She Yang,et al.  Nature-Inspired Metaheuristic Algorithms , 2008 .

[3]  S. Louis Hakimi,et al.  Characterization of Connection Assignment of Diagnosable Systems , 1974, IEEE Transactions on Computers.

[4]  Andrzej Pelc,et al.  Complexity of Fault Diagnosis in Comparison Models , 1992, IEEE Trans. Computers.

[5]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[6]  Tiko Kameda,et al.  A Diagnosing Algorithm for Networks , 1975, Inf. Control..

[7]  Amiya Nayak,et al.  A parallel genetic algorithm for identifying faults in large diagnosable systems , 2005, Parallel Algorithms Appl..

[8]  Kyung-Yong Chwa,et al.  Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems , 1981, Inf. Control..

[9]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[10]  James Kennedy,et al.  Defining a Standard for Particle Swarm Optimization , 2007, 2007 IEEE Swarm Intelligence Symposium.

[11]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[12]  Aurora Trinidad Ramirez Pozo,et al.  A comparison of evolutionary algorithms for system-level diagnosis , 2005, GECCO '05.

[13]  Béchir el Ayeb Fault identification algorithmic: a new formal approach , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[14]  Miroslaw Malek,et al.  A comparison connection assignment for diagnosis of multiprocessor systems , 1980, ISCA '80.

[15]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[16]  Sheng-Fa Yuan,et al.  Fault diagnostics based on particle swarm optimisation and support vector machines , 2007 .

[17]  Andrzej Pelc,et al.  Undirected Graph Models for System-Level Fault Diagnosis , 1991, IEEE Trans. Computers.

[18]  Amiya Nayak,et al.  Network Fault Diagnosis: An Artificial Immune System Approach , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[19]  G. F. Sullivan,et al.  An O(t3 + |E|) Fault Identification Algorithm for Diagnosable Systems , 1988, IEEE Trans. Computers.