Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling

Many DAG scheduling algorithms generate schedules that require prohibitively large number of processors. To address this problem, we propose a generic algorithm, SC, to minimize the processor requirement of any given valid schedule. SC preserves the schedule length of the original schedule and reduces processor count by merging processor schedules and removing redundant duplicate tasks. To the best of our knowledge, this is the first algorithm to address this highly unexplored aspect of DAG scheduling. On average, SC reduced the processor requirement 91, 82, and 72 percent for schedules generated by PLW, TCSD, and CPFD algorithms, respectively. SC algorithm has a low complexity (O{N}3) compared to most duplication-based algorithms. Moreover, it decouples processor economization from schedule length minimization problem. To take advantage of these features of SC, we also propose a scheduling algorithm SDS, having the same time complexity as SC. Our experiments demonstrate that schedules generated by SDS are only 3 percent longer than CPFD (O{N}4), one of the best algorithms in that respect. SDS and SC together form a two-stage scheduling algorithm that produces schedules with high quality and low processor requirement, and has lower complexity than the comparable algorithms that produce similar high-quality results.

[1]  Guodong Li,et al.  Task clustering and scheduling to multiprocessors with duplication , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[2]  Behrooz Shirazi,et al.  Comparative study of task duplication static scheduling versus clustering and non-clustering techniques , 1995, Concurr. Pract. Exp..

[3]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[4]  Behrooz Shirazi,et al.  DFRN: a new approach for duplication based scheduling for distributed memory multiprocessor systems , 1997, Proceedings 11th International Parallel Processing Symposium.

[5]  David E. Bernholdt,et al.  Automatic code generation for many-body electronic structure methods: the tensor contraction engine , 2006 .

[6]  Jing-Chiou Liou,et al.  Task Clustering and Scheduling for Distributed Memory Parallel Architectures , 1996, IEEE Trans. Parallel Distributed Syst..

[7]  Lee C. Potter,et al.  Statistical Prediction of Task Execution Times through Analytic Benchmarking for Scheduling in a Heterogeneous Environment , 1999, IEEE Trans. Computers.

[8]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[9]  Jun Gu,et al.  Efficient Local Search for DAG Scheduling , 2001, IEEE Trans. Parallel Distributed Syst..

[10]  Qinghua Li,et al.  An efficient scheduling algorithm for dependent tasks , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[11]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[12]  Kuldip Singh,et al.  An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[13]  Guodong Li,et al.  Scalable duplication strategy with bounded availability of processors , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[14]  Arjan J. C. van Gemund,et al.  Low-Cost Task Scheduling for Distributed-Memory Machines , 2002, IEEE Trans. Parallel Distributed Syst..

[15]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[16]  Yeh-Ching Chung,et al.  Improving Static Task Scheduling in Heterogeneous and Homogeneous Computing Systems , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[17]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[18]  Sanjeev Baskiyar Scheduling task in-trees on distributed memory systems , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[19]  Boontee Kruatrachue,et al.  Grain size determination for parallel processing , 1988, IEEE Software.

[20]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[21]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[22]  Tatsuhiro Tsuchiya,et al.  Genetics-based multiprocessor scheduling using task duplication , 1998, Microprocess. Microsystems.

[23]  Eylem Ekici,et al.  A task duplication based scheduling algorithm using partial schedules , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[24]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[25]  Michel Cosnard,et al.  Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[26]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[27]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[28]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[29]  Tae-Young Choe,et al.  An optimal scheduling algorithm based on task duplication , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[30]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[31]  Edward A. Lee,et al.  A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[32]  Philippe Chrétienne,et al.  C.P.M. Scheduling with Small Communication Delays and Task Duplication , 1991, Oper. Res..

[33]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[34]  S. Ranka,et al.  Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors , 1992, Proceedings Supercomputing '92.

[35]  Lan Zhou,et al.  A Genetic Scheduling Algorithm Based on Knowledge for Multiprocessor System , 2007, 2007 International Conference on Communications, Circuits and Systems.

[36]  Wei-Ming Lin,et al.  Efficient task scheduling with duplication for bounded number of processors , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[37]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[38]  T. Wajdi,et al.  Optimal algorithm for tree scheduling with unit time communication delays , 2001 .

[39]  Cristina Boeres,et al.  Cluster-based static scheduling: theory and practice , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[40]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..