Local grid scheduling techniques using performance prediction

The use of computational grids to provide an integrated computer platform, composed of differentiated and distributed systems, presents fundamental resource and workload management questions. Key services such as resource discovery, monitoring and scheduling are inherently more complicated in a grid environment where the resource pool is large, dynamic and architecturally diverse. The authors approach the problem of grid workload management through the development of a multi-tiered scheduling architecture (TITAN) that employs a performance prediction system (PACE) and task distribution brokers to meet user-defined deadlines and improve resource usage efficiency. Attention is focused on the lowest tier which is responsible for local scheduling. By coupling application performance data with scheduling heuristics, the architecture is able to balance the processes of minimising run-to-completion time and processor idle time, whilst adhering to service deadlines on a per-task basis.

[1]  Graham R. Nudd,et al.  Application Execution Steering using On-the-Fly Performance Prediction , 1998, HPCN Europe.

[2]  Graham R. Nudd,et al.  Modelling of ASCI High Performance Applications Using PACE , 1999 .

[3]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[4]  Rajkumar Buyya,et al.  Nature's heuristics for scheduling jobs on Computational Grids , 2000 .

[5]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[6]  Subhash Saini,et al.  ARMS: An agent-based resource management system for grid computing , 2002, Sci. Program..

[7]  Joseph Y.-T. Leung,et al.  Complexity of Scheduling Parallel Task Systems , 1989, SIAM J. Discret. Math..

[8]  David Abramson,et al.  Research from the Trenches: Nimrod-G Resource Broker for Service-Oriented Grid Computing , 2001, IEEE Distributed Syst. Online.

[9]  FosterIan,et al.  Grid Services for Distributed System Integration , 2002 .

[10]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[11]  Mary K. Vernon,et al.  Poems: end-to-end performance design of large parallel adaptive computational systems , 1998, WOSP '98.

[12]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[13]  M. D. Kidwell,et al.  Genetic allgorithm for dynamic task scheduling , 1994, Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications.

[14]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[15]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[16]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[18]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[19]  Graham R. Nudd,et al.  High Performance Service Discovery in Large-Scale Multi-Agent and Mobile-Agent Systems , 2001, Int. J. Softw. Eng. Knowl. Eng..

[20]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[21]  Henri Casanova,et al.  NetSovle: A Network Server for Solving Computational Science Problems , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[22]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[23]  Stephen A. Jarvis,et al.  A Transaction Definition Language for Java Application Response Measurement , 2001 .

[24]  Ian Foster,et al.  The Globus toolkit , 1998 .

[25]  Satoshi Matsuoka,et al.  Overview of a performance evaluation system for global computing scheduling algorithms , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[26]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[27]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[28]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[29]  Italo Epicoco,et al.  The Grid Resource Broker, a ubiquitous grid computing framework , 2002, Sci. Program..

[30]  Graham R. Nudd,et al.  Pace—A Toolset for the Performance Prediction of Parallel and Distributed Systems , 2000, Int. J. High Perform. Comput. Appl..

[31]  Graham R. Nudd,et al.  Performance optimization of financial option calculations , 2000, Parallel Comput..

[32]  Albert Y. Zomaya,et al.  Observations on Using Genetic Algorithms for Dynamic Load-Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[33]  Nirwan Ansari,et al.  A Genetic Algorithm for Multiprocessor Scheduling , 1994, IEEE Trans. Parallel Distributed Syst..