Two-Phase Computation and Data Scheduling Algorithms for Workflows in the Grid

In this paper, a workflow scheduling approach, which consists of two algorithms, is proposed. A submitted workflow is first partitioned into subgraphs on the global Grid level by the graph partitioning algorithm according to features of the workflow itself and the status of selected available resource clusters. Then, at the resource cluster level, metatasks in each subgraph are allocated to computational resources by the metatask mapping algorithm. To reduce the total makespan of a workflow, the schedule of raw input data preloading are considered by the two algorithms. This two-phase approach does not require detailed resource information or control privilege on every Grid resource for Grid schedulers at the global Grid level, so that the dependence on Grid information services is reduced and the higher priority of local resource management policies is respected.

[1]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[2]  Wednesday September,et al.  2007 International Conference on Parallel Processing , 2007 .

[3]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[4]  Rajkumar Buyya,et al.  A Deadline and Budget Constrained Scheduling Algorithm for eScience Applications on Data Grids , 2005, ICA3PP.

[5]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[6]  Viktor K. Prasanna,et al.  A unified resource scheduling framework for heterogeneous computing environments , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[7]  Peter A. Dinda,et al.  Synthesizing Realistic Computational Grids , 2003, SC.

[8]  Selim G. Akl,et al.  PFAS: A Resource-Performance-Fluctuation-Aware Workflow Scheduling Algorithm for Grid Computing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[10]  Selim G. Akl,et al.  An Adaptive Double-layer Workflow Scheduling Approach for Grid Computing , 2007, 21st International Symposium on High Performance Computing Systems and Applications (HPCS'07).

[11]  Selim G. Akl,et al.  A Joint Data and Computation Scheduling Algorithm for the Grid , 2007, Euro-Par.

[12]  Antoine Vernois,et al.  Simultaneous Scheduling of Replication and Computation for Bioinformatic Applications on the Grid , 2005, ISBMDA.

[13]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[14]  Stephen A. Jarvis,et al.  Mapping DAG-based applications to multiclusters with background workload , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[15]  Gabriel Mateescu Quality of Service on the Grid Via Metascheduling with Resource Co-Scheduling and Co-Reservation , 2003, Int. J. High Perform. Comput. Appl..