Task Clustering Heuristics for Efficient Execution Time Reduction in Workflow Scheduling

Nowadays, many large-scale scientific and engineering applications are usually constructed as dependent task graphs, or called workflows, for describing complex interrelated computation and communication among constituent software modules or programs. Therefore, scheduling workflows efficiently becomes an important issue in modern parallel computing environments, such as cluster, grid, and cloud. Task clustering is one of the major categories of task graph scheduling approaches, aiming at reducing inter-task communication costs. In this paper, we propose three new task clustering approaches, Critical Path Clustering Heuristic (CPCH), Larger Edge First Heuristic (LEFH), and Critical Child First Heuristic (CCFH), which are expected to achieve better task graph execution performance by trying to minimize the communication costs along execution paths. The proposed schemes were evaluated with a series of simulation experiments and compared to a typical clustering based task graph scheduling approach in the literature. The experimental results indicate that the proposed CPCH, LEFH, and CCFH heuristics outperform the typical scheme significantly, up to 21% performance improvement in terms of average makespan for workflows of large Communication-to-Computation Ratio (CCR).

[1]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[2]  Rizos Sakellariou,et al.  DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[3]  Luiz Fernando Bittencourt,et al.  Fulfilling Task Dependence Gaps for Workflow Scheduling on Grids , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[4]  Chee Sun Liew,et al.  Data-Intensive Workflow Optimization Based on Application Task Graph Partitioning in Heterogeneous Computing Systems , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[5]  Alexey V. Dukhanov,et al.  A clustering-based approach to static scheduling of multiple workflows with soft deadlines in heterogeneous distributed systems , 2015, ICCS.

[6]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[7]  Kuo-Chan Huang,et al.  Task ranking and allocation in list-based workflow scheduling on parallel computing platform , 2014, The Journal of Supercomputing.

[8]  Boontee Kruatrachue,et al.  Grain size determination for parallel processing , 1988, IEEE Software.

[9]  Jatinder N. D. Gupta,et al.  Heuristics for Provisioning Services to Workflows in XaaS Clouds , 2016, IEEE Transactions on Services Computing.

[10]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[11]  Vivek Sarkar,et al.  Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[12]  Ruisheng Zhang,et al.  A QoS-Based Scheduling Approach for Complex Workflow Applications , 2010, 2010 Fifth Annual ChinaGrid Conference.

[13]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[14]  Hamid Reza Boveiri,et al.  An Efficient Task Priority Measurement for List-Scheduling in Multiprocessor Environments , 2015 .

[15]  Tao Yang,et al.  A Comparison of Clustering Heuristics for Scheduling Directed Acycle Graphs on Multiprocessors , 1992, J. Parallel Distributed Comput..

[16]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[17]  Rajkumar Buyya,et al.  Cooperative and decentralized workflow scheduling in global grids , 2010, Future Gener. Comput. Syst..

[18]  Radu Prodan,et al.  Taxonomies of the Multi-Criteria Grid Workflow Scheduling Problem , 2008 .

[19]  L. F. Bittencourt,et al.  A Path Clustering Heuristic for Scheduling Task Graphs onto a Grid , 2013 .

[20]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[21]  Kenli Li,et al.  An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems , 2014, Cluster Computing.

[22]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[23]  Edmundo Roberto Mauro Madeira,et al.  A performance-oriented adaptive scheduler for dependent tasks on grids , 2008 .

[24]  Alfredo De Santis,et al.  A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems , 2013, International Journal of Parallel Programming.

[25]  Radu Prodan,et al.  Multi-objective workflow scheduling in Amazon EC2 , 2014, Cluster Computing.

[26]  Yang Wang,et al.  On Performance Resilient Scheduling for Scientific Workflows in HPC Systems with Constrained Storage Resources , 2015, ScienceCloud@HPDC.

[27]  Fang Dong,et al.  Scientific workflow scheduling in non-dedicated heterogeneous multicluster with advance reservations , 2015, Integr. Comput. Aided Eng..

[28]  H. R. Boveiri List-Scheduling Techniques in Homogeneous Multiprocessor Environments : A Survey , 2015 .

[29]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[30]  Kuo-Chan Huang,et al.  Online Scheduling of Workflow Applications in Grid Environment , 2010, GPC.