Scheduling large parametric modelling experiments on a distributed meta-computer

Nimrod is a tool which makes it easy to parallelise and distribute large computational experiments based on the exploration of a range of parameterised scenarios. Using Nimrod, it is possible to specify and generate a parametric experiment, and then control the execution of the code across distributed computers. Nimrod has been applied to a range of application areas, including Bioinformatics, Operations Research, Electronic CAD, Ecological Modelling and Computer Movies. Nimrod was extremely successful at generating work, but it contained no mechanisms for scheduling the computation on the underlying resources. Consequently, users would not have any idea when an experiment might complete. We are currently building a new version of Nimrod, called Nimrod/G. Nimrod/G will integrate Nimrod job generation techniques with Globus, an international project which is building the underlying infrastructure for large meta-computing applications. Using Globus, it will be possible for Nimrod users to specify time and cost constraints on computational experiments. Globus provides mechanisms for estimating execution time and waiting delays when using networked queued supercomputers. Nimrod/G will then use these to schedule the work in a way which meets user specified deadlines and cost budgets. In this way, multiple Nimrod users can obtain a quality-of-service from the computational network.

[1]  Ian T. Foster,et al.  Managing Multiple Communication Methods in High-Performance Networked Computing Systems , 1997, J. Parallel Distributed Comput..

[2]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[3]  R. Sosič,et al.  The Nimrod computational workbench: a case study in desktop metacomputing , 1996 .

[4]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[5]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[6]  Rok Sosic,et al.  The Laboratory Bench: Distributed Computing for Parametised Simulations , 1994 .

[7]  Ian T. Foster,et al.  A secure communications infrastructure for high-performance distributed computing , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[8]  Azer Bestavros Load profiling: a methodology for scheduling real-time tasks in a distributed system , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[9]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[10]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[11]  Ian T. Foster,et al.  Remote I/O: fast access to distant storage , 1997, IOPADS '97.

[12]  R. Sosi,et al.  Tool-based Parameterisation : An Application Perspective , 1995 .