Fault Diagnosis of Multiprocessor Systems Based on Genetic and Estimation of Distribution Algorithms: a Performance Evaluation

As faults are unavoidable in large scale multiprocessor systems, it is important to be able to determine which units of the system are working and which are faulty. System-level diagnosis is a long-standing realistic approach to detect faults in multiprocessor systems. Diagnosis is based on the results of tests executed on the system units. In this work we evaluate the performance of evolutionary algorithms applied to the diagnosis problem. Experimental results are presented for both the traditional genetic algorithm (GA) and specialized versions of the GA. We then propose and evaluate specialized versions of Estimation of Distribution Algorithms (EDA) for system-level diagnosis: the compact GA and Population-Based Incremental Learning both with and without negative examples. The evaluation was performed using four metrics: the average number of generations needed to find the solution, the average fitness after up to 500 generations, the percentage of tests that got to the optimal solution and the average time until the solution was found. An analysis of experimental results shows that more sophisticated algorithms converge faster to the optimal solution.

[1]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[2]  Amiya Nayak,et al.  System-Level Fault Diagnosis Using Comparison Models: An Artificial-Immune-Systems-Based Approach , 2006, J. Networks.

[3]  Amiya Nayak,et al.  A parallel genetic algorithm for identifying faults in large diagnosable systems , 2005, Parallel Algorithms Appl..

[4]  Aurora Trinidad Ramirez Pozo,et al.  A comparison of evolutionary algorithms for system-level diagnosis , 2005, GECCO '05.

[5]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[6]  Sampath Rangarajan,et al.  A Distributed System-Level Diagnosis Algorithm for Arbitrary Network Topologies , 1995, IEEE Trans. Computers.

[7]  Amiya Nayak,et al.  Ants vs. faults: A swarm intelligence approach for diagnosing distributed computing networks , 2007, 2007 International Conference on Parallel and Distributed Systems.

[8]  Jimmy J. M. Tan,et al.  Using Node Diagnosability to Determine t-Diagnosability under the Comparison Diagnosis Model , 2009, IEEE Transactions on Computers.

[9]  D. Goldberg,et al.  A Survey of Linkage Learning Techniques in Genetic and Evolutionary Algorithms , 2007 .

[10]  Elias Procópio Duarte,et al.  A dependable SNMP-based tool for distributed network management , 2002, Proceedings International Conference on Dependable Systems and Networks.

[11]  Amiya Nayak,et al.  A Parallel Probabilistic System-level Fault Diagnosis Approach for Large Multiprocessor Systems , 2006, Parallel Process. Lett..

[12]  Mourad Elhadef,et al.  Performance analysis of an evolutionary algorithm for fault detection in t-diagnosable multi-processor systems , 2007, Int. J. Parallel Emergent Distributed Syst..

[13]  Douglas M. Blough,et al.  Distributed diagnosis in dynamic fault environments , 2004, IEEE Transactions on Parallel and Distributed Systems.

[14]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[15]  Takashi Nanya,et al.  A Hierarachical Adaptive Distributed System-Level Diagnosis Algorithm , 1998, IEEE Trans. Computers.

[16]  S. Louis Hakimi,et al.  On Adaptive System Diagnosis , 1984, IEEE Transactions on Computers.

[17]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[18]  Concha Bielza,et al.  A review of estimation of distribution algorithms in bioinformatics , 2008, BioData Mining.

[19]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[20]  Amiya Nayak,et al.  Network Fault Diagnosis: An Artificial Immune System Approach , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[21]  Mourad Elhadef,et al.  An evolutionary algorithm for identifying faults in t-diagnosable systems , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[22]  Sudhakar M. Reddy,et al.  A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair , 1984, IEEE Transactions on Computers.

[23]  Stefano Chessa,et al.  Worst-Case Diagnosis Completeness in Regular Graphs under the PMC Model , 2007, IEEE Transactions on Computers.

[24]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[25]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[26]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[27]  Elias Procópio Duarte,et al.  An algorithm for distributed hierarchical diagnosis of dynamic fault and repair events , 2000, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[28]  Saurabh Bagchi,et al.  Automated Rule-Based Diagnosis Through a Distributed Monitor System , 2007, IEEE Transactions on Dependable and Secure Computing.

[29]  Elias Procópio Duarte,et al.  A distributed network connectivity algorithm , 2003, The Sixth International Symposium on Autonomous Decentralized Systems, 2003. ISADS 2003..

[30]  Yuan Yan Tang,et al.  Efficient Fault Identification of Diagnosable Systems under the Comparison Model , 2007, IEEE Transactions on Computers.

[31]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[32]  S. Louis Hakimi,et al.  Characterization of Connection Assignment of Diagnosable Systems , 1974, IEEE Transactions on Computers.