The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment

The ability to harness heterogeneous, dynamically available grid resources is attractive to typically resource-starved computational scientists and engineers, as in principle it can increase, by significant factors, the number of cycles that can be delivered to applications. However, new adaptive application structures and dynamic runtime system mechanisms are required if we are to operate effectively in grid environments. To explore some of these issues in a practical setting, the authors are developing an experimental framework, called Cactus, that incorporates both adaptive application structures for dealing with changing resource characteristics and adaptive resource selection mechanisms that allow applications to change their resource allocations (e.g., via migration) when performance falls outside specified limits. The authors describe the adaptive resource selection mechanisms and describe how they are used to achieve automatic application migration to “better” resources following performance degradation. The results provide insights into the architectural structures required to support adaptive resource selection. In addition, the authors suggest that the Cactus Worm affords many opportunities for grid computing.

[1]  William Gropp,et al.  PETSc 2.0 users manual , 2000 .

[2]  Jack J. Dongarra,et al.  PVMPI Provides Interoperability Between MPI Implementations , 1997, PPSC.

[3]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[4]  Edward Seidel,et al.  Numerical Relativity As A Tool For Computational Astrophysics , 1999 .

[5]  William Gropp,et al.  Modern Software Tools in Scientific Computing , 1994 .

[6]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[7]  Francine Berman,et al.  High-performance schedulers , 1998 .

[8]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[9]  Fred Douglis,et al.  Transparent process migration: Design alternatives and the sprite implementation , 1991, Softw. Pract. Exp..

[10]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Scalable Comput. Pract. Exp..

[11]  Ian T. Foster,et al.  A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[12]  Ian T. Foster,et al.  Performance Predictions for a Numerical Relativity Package in Grid Environments , 2001, Int. J. High Perform. Comput. Appl..

[13]  Barton P. Miller,et al.  Process migration in DEMOS/MP , 1983, SOSP '83.

[14]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[15]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[16]  Francine Berman,et al.  The GrADS Project: Software Support for High-Level Grid Application Development , 2001, Int. J. High Perform. Comput. Appl..

[17]  Ian Foster,et al.  The Globus toolkit , 1998 .

[18]  Michael M. Resch,et al.  Distributed Computing in a Heterogeneous Computing Environment , 1998, PVM/MPI.

[19]  Sathish S. Vadhiyar,et al.  Numerical Libraries And The Grid: The GrADS Experiments With ScaLAPACK , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[20]  Toshiya Kimura,et al.  Local area metacomputing for multidisciplinary problems: a case study for fluid/structure coupled simulation , 1998, ICS '98.

[21]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[22]  Peter A. Dinda,et al.  Communication and memory requirements as the basis for mapping task and data parallel programs , 1994, Proceedings of Supercomputing '94.

[23]  Rajesh Raman,et al.  High-throughput resource management , 1998 .

[24]  John Shalf,et al.  Solving Einstein's Equations on Supercomputers , 1999, Computer.

[25]  Marvin Theimer,et al.  Heterogeneous process migration by recompilation , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[26]  Wolfgang Ziegler,et al.  Early experiences with the EGrid testbed , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[27]  Sathish S. Vadhiyar,et al.  Numerical Libraries and the Grid , 2001, Int. J. High Perform. Comput. Appl..

[28]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[29]  Jaspal Subhlok,et al.  Automatic node selection for high performance applications on networks , 1999, PPoPP '99.

[30]  Fredrik Vraalsen,et al.  Specifying and Monitoring GrADS Contracts , 2001 .

[31]  Ian T. Foster,et al.  Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[32]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..