Managing Clusters of Geographically Distributed High-Performance Computers

We present a software system for the management of geographically distributed high‐performance computers. It consists of three components: 1. The Computing Center Software (CCS) is a vendor‐independent resource management software for local HPC systems. It controls the mapping and scheduling of interactive and batch jobs on massively parallel systems; 2. The Resource and Service Description (RSD) is used by CCS for specifying and mapping hardware and software components of (meta‐)computing environments. It has a graphical user interface, a textual representation and an object‐oriented API; 3. The Service Coordination Layer (SCL) co‐ordinates the co‐operative use of resources in autonomous computing sites. It negotiates between the applications' requirements and the available system services.

[1]  Alexander Reinefeld,et al.  PHASE and MICA: Application Specific Metacomputing , 1997, Euro-Par.

[2]  Kurt Kremer,et al.  A Distributed Computing Center Software for the Efficient Use of Parallel Computer Systems , 1994, HPCN.

[3]  Andrew S. Grimshaw,et al.  Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems , 1994, J. Parallel Distributed Comput..

[4]  Alexander Reinefeld,et al.  MARS - A framework for minimizing the job execution time in a metacomputing environment , 1996, Future Gener. Comput. Syst..

[5]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[6]  Rajiv M. Dewan,et al.  Internet service providers, proprietary content, and the battle for users' dollars , 1998, CACM.

[7]  Geoffrey C. Fox,et al.  Cluster Computing Review , 1995 .

[8]  Nicholas Carriero,et al.  Adaptive Parallelism and Piranha , 1995, Computer.

[9]  Michael M. Resch,et al.  An Extension to MPI for Distributed Computing on MPPs , 1997, PVM/MPI.

[10]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[11]  W. S. Meisel A numerical integration formula useful in Fourier analysis , 1968, CACM.

[12]  Axel Keller,et al.  CCS resource management in networked HPC systems , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[13]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[14]  Clifford C. Huff,et al.  Elements of a realistic CASE tool adoption budget , 1992, CACM.

[15]  Ian T. Foster,et al.  Managing Multiple Communication Methods in High-Performance Networked Computing Systems , 1997, J. Parallel Distributed Comput..

[16]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[17]  Ian T. Foster,et al.  Remote I/O: fast access to distant storage , 1997, IOPADS '97.

[18]  Jörn Gehring,et al.  Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations , 1996, JSSPP.

[19]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[20]  Friedhelm Ramme,et al.  A General Purpose Resource Description Language , 1991, Transputer-Anwender-Treffen.

[21]  F. Tandiary,et al.  Batrun: utilizing idle workstations for large scale computing , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[22]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[23]  Warren Smith,et al.  Software infrastructure for the I-WAY high-performance distributed computing experiment , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[24]  Alexander Reinefeld,et al.  The MOL project: an open, extensible metacomputer , 1997, Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).

[25]  Ranieri Baraglia,et al.  Experiences with a wide area network metacomputing management tool using IBM˜SP‐2 parallel systems , 1997 .

[26]  Axel Keller,et al.  RSD — Resource and Service Description , 1998 .