“Dynamic-fault-prone BSP”: a paradigm for robust computations in changing environments

In this paper we present an efficient general simulation strategy for computations designed for fully operational BSP machines of n ideal processors, on n-processor dynamic-fauhprone BSP machines. The fault occurrences are fail-stop and fully dynamic, i.e., they are ahowed to happen on-line at any point of the computation, subject to the constraint that the total number of faulty processors may never exceed a known fraction. The computational paradigm can be exploited for robust computations over virtual parallel settings with volatile underlying infrastructure, such as a Network of Workstations (where workstations may be taken out of the virtual parallel machine by their owner). Our simulation strategy is Las Vegas (i.e. it may never fail, due to a BACKTRACKING process to robustly stored instances of the computation). It adopts an adaptive balancing scheme of the work load among the currently live processors of the BSP machine. Moreover, the storage schemes adopted in this work achieve space optimality, which is very crucial in the BSP cost model, since space overhead is interpreted into communication overhead, when a fraction of the work load has to migrate to a currently live processor. Our strategy is efficient in the sense that, compared to an optimal off-line adversarial computation under the same sequence of fault occurrences, it achieves an 0 ((log n . loglog n)“) multiplicative factor times the optimal work (namely, this measure is in the sense of “competitive ratio” of on-line analysis). In addition, our scheme is modular, integrated, and considers many implementation points. ‘This work was partially supported by the ESPRIT LTR ALCOMIT (contract No. 20244) and GEPPCOM (contract No. 9072). (t) Computer Engineering and Informatlcs Department, Patras University, 26500 Flion, Patras, Greece. (f) Computer Technology Institute, Kolokotroni 3, 26221 Patras, Greece. Email:{kontog,pantziou,spirakis)Qcti.gr (t) Certco, 55 Broad St., 22nd suite, New York, NY 10004, USA. E-mail:motiocs.columbia.edu and motiQcertco.com. Rmissim to make digital or hard copies of all or part of this work for Personal or cla.~room Use is 8fanted without fee provided that copies we not made or distributed for profit or commercial advantage and that copies bear this notice and the till citation on the first page. To copy ott=W’ise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SPAA 98 Puerto Vallarta Mexico Copyright ACM 1998 O-89791-989-0/98/ 6...$5.00 We comment that, up to our knowledge, no previous work on robust parallel computations has considered fully dynamic faults in the BSP model, or in general distributed memory systems. Furthermore, this is the first time where an efficient Las Vegas simulation in this area is achieved.

[1]  Paul G. Spirakis,et al.  Efficient computations on fault-prone BSP machines , 1997, SPAA '97.

[2]  Alexander A. Shvartsman,et al.  Efficient parallel algorithms on restartable fail-stop processors , 1991, PODC '91.

[3]  Devdatt P. Dubhashi,et al.  Negative dependence through the FKG Inequality , 1996 .

[4]  Moti Yung,et al.  Resolving message complexity of Byzantine Agreement and beyond , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[6]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[7]  Paul G. Spirakis,et al.  Tail bounds for occupancy and the satisfiability threshold conjecture , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[8]  Friedhelm Meyer auf der Heide,et al.  Truly Efficient Parallel Algorithms: 1-optimal Multisearch for an Extension of the BSP Model , 1998, Theor. Comput. Sci..

[9]  Alexandros V. Gerbessiotis,et al.  Communication Efficient Data Structures on the BSP Model with Applications in Computational Geometry , 1996, Euro-Par, Vol. II.

[10]  Moti Yung,et al.  Time-optimal message-efficient work performance in the presence of faults , 1994, PODC '94.

[11]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[12]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[13]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[14]  Joseph Y. Halpern,et al.  Performing work efficiently in the presence of faults , 1992, PODC '92.

[15]  Alexander A. Shvartsman,et al.  Efficient parallel algorithms can be made robust , 1989, PODC '89.

[16]  Z. M. Kedem,et al.  Combining tentative and definite executions for very fast dependable parallel computing , 1991, STOC '91.

[17]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[18]  Yonatan Aumann,et al.  Highly efficient asynchronous execution of large-grained parallel programs , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[19]  Paul G. Spirakis,et al.  Efficient robust parallel computations , 2018, STOC '90.

[20]  Desh Ranjan,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[21]  Michael A. Bender,et al.  Efficient execution of nondeterministic parallel programs on asynchronous systems , 1996, SPAA '96.

[22]  Partha Dasgupta,et al.  Parallel processing on networks of workstations: a fault-tolerant, high performance approach , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[23]  Andrzej Pelc,et al.  Fast Deterministic Simulation of Computations on Faulty Parallel Machines , 1995, ESA.

[24]  Richard J. Lipton,et al.  Proceedings of the tenth annual ACM symposium on Theory of computing , 1978 .