'Cheap grid': Leveraging system failure using stochastic computation

Traditionally, network and computation failure on a heterogeneous network are viewed as an unfortunate obstacle to reliable, efficient computation. We propose that such noise can be incorporated into the algorithm design as part of the necessary source of randomness used in stochastic computation. This paradigm incorporates network and computation failure at a high level in the solution-discovery algorithm, rather than attempting to hide and suppress all such noise at the lowest possible levels in the computation tool. This idea enables the creation of a network solution system with extremely small amounts of global state. This lack of required system state allows for heightened degrees of scalability in the computation engine, and fewer resources are consumed by system management. Algorithms with a stochastic component are easily adapted to this system; various types of evolutionary computation are particularly well adapted to this hybrid paradigm. A specific example using a modified steady-state genetic algorithm is provided to explore the functionality of the resulting composite system. The developed architecture is used to calculate the solution to a number of problems, in each case converging on a solution measurably faster than that of a fully ''fault-tolerant'' scheme, thereby resulting in lower overhead and faster execution time.

[1]  R. Dawkins The Blind Watchmaker , 1986 .

[2]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[3]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[4]  Kam Hong Shum Fault tolerant cluster computing through replication , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[5]  Pragyansmita Paul SETI @ home project and its website , 2002, CROS.

[6]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[7]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[8]  Christian Engelmann,et al.  Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .

[9]  Gilbert Syswerda,et al.  A Study of Reproduction in Generational and Steady State Genetic Algorithms , 1990, FOGA.

[10]  Xiaolong Wang,et al.  Optimal task partition and distribution in grid service system with common cause failures , 2007, Future Gener. Comput. Syst..

[11]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[12]  Hideyuki Takagi,et al.  Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation , 2001, Proc. IEEE.

[13]  Daniel A. Spielman,et al.  Highly fault-tolerant parallel computation , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[14]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[15]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[16]  Tong Liu,et al.  Achieving high availability and performance computing with an HA-OSCAR cluster , 2005, Future Gener. Comput. Syst..

[17]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[18]  Laxmikant V. Kalé,et al.  A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[19]  A. Kongmunvattana,et al.  Lightweight fault detection for shared virtual memory clusters , 2004, 2004 IEEE Region 10 Conference TENCON 2004..

[20]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[21]  George Coulouris,et al.  Distributed systems - concepts and design , 1988 .

[22]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .