Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment

This paper describes the performance and interoperability issues that arise in the process of integrating cluster management systems into a wide-area network-computing environment, and provides solutions in the context of the Purdue University Network Computing Hubs (PUNCH). The described solution provides users with a single point of access to resources spread across administrative domains, and an intel ligent translation process makes it possible for users to submit jobs to different types of cluster management systems in a transparent manner. The approach does not require any modifications to the cluster management software; however, call-back and caching capabilities that would improve performance and make such systems more interoperable with wide-area computing systems are discussed.

[1]  José A. B. Fortes,et al.  The Purdue University network-computing hubs: running unmodified simulation tools via the WWW , 2000, TOMC.

[2]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[3]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[4]  Geoffrey C. Fox,et al.  A Review of Commercial and Research Cluster Management Software , 1996 .

[5]  Carla E. Brodley,et al.  Predictive application-performance modeling in a computational grid environment , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[6]  Jack J. Dongarra,et al.  The PVM Concurrent Computing System: Evolution, Experiences, and Trends , 1994, Parallel Comput..

[7]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  José A. B. Fortes,et al.  Interfacing wide-area network computing and cluster management software: Condor, DQS and PBS via PUNCH , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[9]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[10]  L Nelson Michael,et al.  A Comparison of Queueing, Cluster and Distributed Computing Systems , 1994 .

[11]  Miron Livny,et al.  Improving Goodput by Coscheduling CPU and Network Capacity , 1999, Int. J. High Perform. Comput. Appl..

[12]  LivnyMiron,et al.  Improving Goodput by Coscheduling CPU and Network Capacity , 1999 .

[13]  Renato J. O. Figueiredo,et al.  PUNCH: Web Portal for Running Tools , 2000, IEEE Micro.