Characterizing fault tolerance in genetic programming

Evolutionary algorithms, including genetic programming (GP), are frequently employed to solve difficult real-life problems, which can require up to days or months of computation. An approach for reducing the time-to-solution is to use parallel computing on distributed platforms. Large platforms such as these are prone to failures, which can even be commonplace events rather than rare occurrences. Thus, fault tolerance and recovery techniques are typically necessary. The aim of this article is to show the inherent ability of parallel GP to tolerate failures in distributed platforms without using any fault-tolerant technique. This ability is quantified via simulation experiments performed using failure traces from real-world distributed platforms, namely, desktop grids, for two well-known problems.

[1]  Roy Friedman,et al.  Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[2]  Gilles Fedak,et al.  The Computational and Storage Potential of Volunteer Computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[3]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[4]  Ben Paechter,et al.  A Framework for Distributed Evolutionary Algorithms , 2002, PPSN.

[5]  Marco Tomassini,et al.  Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Natural Computing Series) , 2005 .

[6]  Partha Dasgupta,et al.  CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platforms , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[7]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[8]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[9]  Miron Livny,et al.  Managing Checkpoints for Parallel Programs , 1996, JSSPP.

[10]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part I , 2003, Evolutionary Computation.

[11]  Giandomenico Spezzano,et al.  CAGE: A Tool for Parallel Genetic Programming Applications , 2001, EuroGP.

[12]  Ben Paechter,et al.  Maintaining Connectivity in a Scalable and Robust Distributed Environment , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[13]  Shrisha Rao,et al.  Distributed Systems: An Algorithmic Approach , 2008, IEEE Distributed Systems Online.

[14]  Thomas Bräunl,et al.  Population variation in genetic programming , 2007, Inf. Sci..

[15]  El-Ghazali Talbi,et al.  Grid computing for parallel bioinspired algorithms , 2006, J. Parallel Distributed Comput..

[16]  Erick Cantú-Paz,et al.  A Survey of Parallel Genetic Algorithms , 2000 .

[17]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[18]  Daniel Lombraña Gonzalez,et al.  Analyzing fault tolerance on parallel genetic programming by means of dynamic-size populations , 2007, 2007 IEEE Congress on Evolutionary Computation.

[19]  Leonardo Vanneschi,et al.  Parallel genetic programming , 2005 .

[20]  Francisco Fernández de Vega,et al.  A Fault Tolerant Optimization Algorithm based on Evolutionary Computation , 2006, 2006 International Conference on Dependability of Computer Systems.

[21]  Leonardo Vanneschi,et al.  A new technique for dynamic size populations in genetic programming , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[22]  William H. Sanders,et al.  A performability-oriented software rejuvenation framework for distributed applications , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[23]  Miron Livny,et al.  Interfacing Condor and PVM to harness the cycles of workstation clusters , 1996, Future Gener. Comput. Syst..

[24]  Leonardo Trujillo,et al.  Automated Design of Image Operators that Detect Interest Points , 2008, Evolutionary Computation.

[25]  Andrew A. Chien,et al.  Henri Casanova , 2022 .

[26]  John R. Koza,et al.  Parallel genetic programming: a scalable implementation using the transputer network architecture , 1996 .

[27]  M. Tomassini,et al.  Saving computational effort in genetic programming by means of plagues , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[28]  Gilles Fedak,et al.  Resource Availability in Enterprise Desktop Grids , 2006 .

[29]  Felix C. Gärtner,et al.  Fundamentals of fault-tolerant distributed computing in asynchronous environments , 1999, CSUR.

[30]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[31]  Charng-Da Lu,et al.  Reliability challenges in large systems , 2006, Future Gener. Comput. Syst..

[32]  José Ignacio Hidalgo,et al.  Is the island model fault tolerant? , 2007, GECCO '07.

[33]  El-Ghazali Talbi,et al.  Building with ParadisEO reusable parallel and distributed evolutionary algorithms , 2004, Parallel Comput..

[34]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[35]  Leonardo Trujillo,et al.  Customizable execution environments with virtual desktop grid computing , 2007 .

[36]  Enrique Alba,et al.  Parallelism and evolutionary algorithms , 2002, IEEE Trans. Evol. Comput..

[37]  F. Cappello,et al.  Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[38]  Gilles Fedak,et al.  Characterizing resource availability in enterprise desktop grids , 2007, Future Gener. Comput. Syst..

[39]  Thomas Bräunl,et al.  Dynamic population variation in genetic programming , 2009, Inf. Sci..

[40]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[41]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[42]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[43]  Martin L. Shooman,et al.  Reliability of Computer Systems and Networks: Fault Tolerance,Analysis,and Design , 2002 .

[44]  Sean Luke,et al.  Population Implosion in Genetic Programming , 2003, GECCO.

[45]  Marc Parizeau,et al.  Distributed Beagle: An Environment For Parallel And Distributed Evolutionary Computations , 2003 .

[46]  Jack Dongarra,et al.  Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems , 2004 .