A scalable, robust network for parallel computing

CX, a network-based computational exchange, is presented. The system's design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks, eager scheduling, and space-based coordination. The object-oriented API is simple, compact, and cleanly separates application logic from the logic that supports interprocess communication and fault tolerance. Computations, of course, run to completion in the presence of computational hosts that join and leave the ongoing computation. Such hosts, or producers, use task caching and prefetching to overlap computation with interprocessor communication. To break a potential task server bottleneck, a network of task servers is presented. Even though task servers are envisioned as reliable, the self-organizing, scalable network of n servers, described as a sibling-connected fat tree, tolerates a sequence of n — 1 server failures. Tasks are distributed throughout the server network via a simple “diffusion” process. CX is intended as a test bed for research on automated silent auctions, reputation services, authentication services, and bonding services. CX also provides a test bed for algorithm research into network-based parallel computation.

[1]  Li Gong,et al.  Inside Java 2 Platform Security: Architecture, API Design, and Implementation , 1999 .

[2]  Gregor von Laszewski,et al.  CoG kits: a bridge between commodity distributed computing and high-performance grids , 2000, JAVA '00.

[3]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[4]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[5]  Boleslaw K. Szymanski,et al.  Runtime Support for Virtual BSP Computer , 1998, IPPS/SPDP Workshops.

[6]  Peter R. Cappello,et al.  Javelin++: scalability issues in global computing , 1999, JAVA '99.

[7]  Stephan Kindermann,et al.  First steps in metacomputing with Amica , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[8]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[9]  Peter R. Cappello,et al.  Market-based massively parallel Internet computing , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[10]  Craig J. Patten,et al.  DISCWorld: an environment for service-based matacomputing , 1999, Future Gener. Comput. Syst..

[11]  Peter R. Cappello,et al.  Javelin: Parallel computing on the internet , 1999, Future Gener. Comput. Syst..

[12]  Eric A. Brewer,et al.  ATLAS: an infrastructure for global computing , 1996, EW 7.

[13]  Andrew S. Tanenbaum,et al.  The Globe Distribution Network , 2000, USENIX Annual Technical Conference, FREENIX Track.

[14]  E. Drexler,et al.  Incentive engineering for computational resource management , 1988 .

[15]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[16]  Tad Hogg,et al.  Spawn: A Distributed Computational Economy , 1992, IEEE Trans. Software Eng..

[17]  Michael O. Neary,et al.  Javelin 2.0: Java-Based Parallel Computing on the Internet , 2000, Euro-Par.

[18]  Eytan Adar,et al.  Free Riding on Gnutella , 2000, First Monday.

[19]  Jason Maassen,et al.  Wire-area parallel computing in Java , 1999, JAVA '99.

[20]  Boleslaw K. Szymanski,et al.  BSP-Based Adaptive Parallel Processing , 1999 .

[21]  Robert D. Blumofe,et al.  Adaptive and Reliable ParallelComputing9 Networks of Workstations , 1997 .

[22]  David Kaminsky Adaptive parallelism with Piranha , 1995 .

[23]  Fred S. Roberts,et al.  Applied Combinatorics , 1984 .

[24]  Ken Arnold,et al.  JavaSpaces¿ Principles, Patterns, and Practice , 1999 .

[25]  David Abramson,et al.  An Economy Driven Resource Management Architecture for Global Computational Power Grids , 2000, PDPTA.

[26]  Peter R. Cappello,et al.  Javelin: Internet‐based parallel computing using Java , 1997 .

[27]  Axel Keller,et al.  Lessons learned while operating two large SCI clusters , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[28]  Neil Spring,et al.  Application level scheduling of gene sequence comparison on metacomputers , 1998 .

[29]  Zvi M. Kedem,et al.  Charlotte: Metacomputing on the Web , 1999, Future Gener. Comput. Syst..

[30]  José A. B. Fortes,et al.  Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment , 2000, ACM/IEEE SC 2000 Conference (SC'00).