Advanced eager scheduling for Java‐based adaptive parallel computing

Javelin 3 is a software system for developing large-scale, fault tolerant, adaptively parallel applications. When all or part of their application can be cast as a master-worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1,000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts.

[1]  Eric A. Brewer,et al.  ATLAS: an infrastructure for global computing , 1996, EW 7.

[2]  Francine Berman,et al.  Toward a framework for preparing and executing adaptive grid programs , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[3]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[4]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[5]  Chandra Krintz,et al.  Running EveryWare on the Computational Grid , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[6]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[7]  Noam Nisan,et al.  The popcorn project: distributed computation over the internet in java , 1997 .

[8]  Peter R. Cappello,et al.  Javelin++: scalability issues in global computing , 2000 .

[9]  Stephen P. Boyd,et al.  Branch and Bound Methods , 1987 .

[10]  Gregor von Laszewski,et al.  CoG kits: a bridge between commodity distributed computing and high-performance grids , 2000, JAVA '00.

[11]  Satoshi Hirano,et al.  Bayanihan: building and studying web-based volunteer computing systems using Java , 1999, Future Gener. Comput. Syst..

[12]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[13]  Boleslaw K. Szymanski,et al.  BSP-Based Adaptive Parallel Processing , 1999 .

[14]  David Simchi-Levi,et al.  The logic of logistics , 1997 .

[15]  Geoffrey C. Fox,et al.  Java for parallel computing and as a general language for scientific and engineering simulation and modeling , 1997, Concurr. Pract. Exp..

[16]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[17]  Michael O. Neary,et al.  Javelin 2.0: Java-Based Parallel Computing on the Internet , 2000, Euro-Par.

[18]  Jason Maassen,et al.  Ibis: an efficient Java-based grid programming environment , 2002, JGI '02.

[19]  David Gelernter,et al.  Supercomputing out of recycled garbage: preliminary experience with Piranha , 1992, ICS '92.

[20]  Stephan Kindermann,et al.  First steps in metacomputing with Amica , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[21]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[22]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[23]  Tim Brecht,et al.  ParaWeb: towards world-wide supercomputing , 1996, EW 7.

[24]  Robert D. Blumofe,et al.  Executing multithreaded programs efficiently , 1995 .

[25]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[26]  Peter R. Cappello,et al.  Javelin: Internet-based Parallel Computing using Java , 1997, Concurr. Pract. Exp..

[27]  Zvi M. Kedem,et al.  Charlotte: Metacomputing on the Web , 1999, Future Gener. Comput. Syst..

[28]  Boleslaw K. Szymanski,et al.  Runtime Support for Virtual BSP Computer , 1998, IPPS/SPDP Workshops.

[29]  Peter R. Cappello,et al.  Javelin++: scalability issues in global computing , 1999, JAVA '99.

[30]  Peter R. Cappello,et al.  Internet-based TSP computation with Javelin++ , 2000, Proceedings 2000. International Workshop on Parallel Processing.

[31]  Thomas Fahringer,et al.  JavaSymphony: new directives to control and synchronize locality, parallelism, and load balancing for cluster and GRID-computing , 2002, JGI '02.

[32]  Jason Maassen,et al.  Wire-area parallel computing in Java , 1999, JAVA '99.

[33]  Andrew S. Tanenbaum,et al.  The Globe Distribution Network , 2000, USENIX Annual Technical Conference, FREENIX Track.

[34]  Chris J. Scheiman,et al.  SuperWeb: research issues in Java-based global computing , 1997, Concurr. Pract. Exp..

[35]  David E. Culler,et al.  REXEC: A Decentralized, Secure Remote Execution Environment for Clusters , 2000, CANPC.

[36]  Craig J. Patten,et al.  DISCWorld: an environment for service-based matacomputing , 1999, Future Gener. Comput. Syst..

[37]  David Levine,et al.  Winner determination in combinatorial auction generalizations , 2002, AAMAS '02.

[38]  Geoffrey C. Fox,et al.  Java for parallel computing and as a general language for scientific and engineering simulation and modeling , 1997 .