Locality Conscious Processor Allocation and Scheduling for Mixed Parallel Applications

Complex applications can often be viewed as a collection of coarse-grained data-parallel application components with precedence constraints. It has been shown that combining task and data parallelism (mixed parallelism) can be an effective execution paradigm for these applications. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism based on the scalability characteristics of the tasks as well as the intertask data communication costs, such that the parallel completion time (makespan) is minimized. The algorithm iteratively reduces the makespan by increasing the degree of data parallelism of tasks on the critical path that have good scalability and a low degree of potential task parallelism. Data communication costs along the critical path are minimized by exploiting parallel transfer mechanisms and use of a locality conscious backfill scheduler. Evaluation using benchmark task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently performs better than previous scheduling schemes

[1]  K. Mani Chandy,et al.  Fortran M: A Language for Modular Parallel Programming , 1995, J. Parallel Distributed Comput..

[2]  P. Sadayappan,et al.  Characterization of backfilling strategies for parallel job scheduling , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[3]  G. N. Srinivasa Prasanna,et al.  Generalised multiprocessor scheduling using optimal control , 1991, SPAA '91.

[4]  Joel H. Saltz,et al.  An Integrated Approach for Processor Allocation and Scheduling of Mixed-Parallel Applications , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[5]  RENAUD LEPÈRE,et al.  Approximation Algorithms for Scheduling Malleable Tasks Under Precedence Constraints , 2001, Int. J. Found. Comput. Sci..

[6]  David E. Bernholdt,et al.  A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[8]  Denis Trystram,et al.  Scheduling parallel applications using malleable tasks on clusters , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[9]  Frédéric Suter,et al.  One-step algorithm for mixed data and task parallel scheduling without data replication , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Klaus Jansen,et al.  Scheduling malleable tasks with precedence constraints , 2005, SPAA '05.

[11]  Arjan J. C. van Gemund,et al.  A low-cost approach towards mixed task and data parallel scheduling , 2001, International Conference on Parallel Processing, 2001..

[12]  David E. Bernholdt,et al.  Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization , 2001, HiPC.

[13]  Keqin Li,et al.  Scheduling Precedence Constrained Parallel Tasks on Multiprocessors Using the Harmonic System Partitioning Scheme , 2005, J. Inf. Sci. Eng..

[14]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[15]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[16]  Bernard Tourancheau,et al.  Fast Runtime Block Cyclic Data Redistribution on Multiprocessors , 1997, J. Parallel Distributed Comput..

[17]  Jaspal Subhlok,et al.  Optimal latency-throughput tradeoffs for data parallel pipelines , 1996, SPAA '96.

[18]  Joseph Y.-T. Leung,et al.  Complexity of Scheduling Parallel Task Systems , 1989, SIAM J. Discret. Math..

[19]  Jacek Blazewicz,et al.  Scheduling Malleable Tasks on Parallel Processors to Minimize the Makespan , 2004, Ann. Oper. Res..

[20]  Allen B. Downey,et al.  A Model For Speedup of Parallel Programs , 1997 .

[21]  Henri E. Bal,et al.  A task- and data-parallel programming language based on shared objects , 1998, TOPL.

[22]  Sachin S. Sapatnekar,et al.  A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..

[23]  Klaus Jansen,et al.  Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks , 1999, SODA '99.

[24]  Thomas Rauber,et al.  Compiler support for task scheduling in hierarchical execution models , 1999, J. Syst. Archit..

[25]  Arjan J. C. van Gemund,et al.  CPR: mixed task and data parallel scheduling for distributed systems , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[26]  Philip S. Yu,et al.  Approximate algorithms scheduling parallelizable tasks , 1992, SPAA '92.