Application-aware scheduling of a magnetohydrodynamics application in the Legion metasystem

Computational grids have become an important and popular computing platform for both scientific and commercial distributed computing communities. However, users of such systems typically find achievement of application execution performance remains challenging. Although grid infrastructures such as Legion and Globus provide basic resource selection functionality, work allocation functionality, and scheduling mechanisms, applications must interpret system performance information in terms of their own requirements in order to develop performance-efficient schedules. We describe a new high-performance scheduler that incorporates dynamic system information, application requirements, and a detailed performance model in order to create performance efficient schedules. While the scheduler is designed to provide improved performance for a magneto-hydrodynamics simulation in the Legion Computational Grid infrastructure, the design is generalizable to other systems and other data-parallel iterative codes. We describe the adaptive performance model, resource selection strategies, and scheduling policies employed by the scheduler. We demonstrate the improvement in application performance achieved by the scheduler in dedicated and shared Legion environments.

[1]  G. Dantzig Programming of Interdependent Activities: II Mathematical Model , 1949 .

[2]  Andrew S. Grimshaw,et al.  Easy-to-use object-oriented parallel processing with Mentat , 1993, Computer.

[3]  Andrew S. Grimshaw,et al.  Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems , 1994, J. Parallel Distributed Comput..

[4]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[5]  Andrew S. Grimshaw,et al.  Legion-a view from 50,000 feet , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[6]  Francine Berman,et al.  Scheduling from the perspective of the application , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[7]  Francine Berman,et al.  Using Apples to Schedule Simple SARA on the Computational Grid , 1999, Int. J. High Perform. Comput. Appl..

[8]  John F. Karpovich,et al.  Resource management in Legion , 1999, Future Gener. Comput. Syst..

[9]  Andrew S. Grimshaw,et al.  Wide-Area Computing: Resource Sharing on a Large Scale , 1999, Computer.

[10]  Francine Berman,et al.  Combining workstations and supercomputers to support grid applications: the parallel tomography experience , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..