Locality-aware task scheduling for homogeneous parallel computing systems

In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.

[1]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[2]  Ahmed Amine Jerraya,et al.  Multiprocessor System-on-Chip (MPSoC) Technology , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[4]  Hidenori Nakazato,et al.  Clustering-Based Task Scheduling in a Large Number of Heterogeneous Processors , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Mats Brorsson,et al.  Scheduling of Parallel Tasks with Proportionate Priorities , 2016 .

[6]  Leonel Sousa,et al.  List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures , 2004, Parallel Comput..

[7]  Timo Hämäläinen,et al.  Automated memory-aware application distribution for Multi-processor System-on-Chips , 2007, J. Syst. Archit..

[8]  Oliver Sinnen,et al.  Reducing the solution space of optimal task scheduling , 2014, Comput. Oper. Res..

[9]  Christoforos E. Kozyrakis,et al.  Locality-aware task management for unstructured parallelism: a quantitative limit study , 2013, SPAA.

[10]  Radu Prodan,et al.  A Multi-objective Approach for Workflow Scheduling in Heterogeneous Environments , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[11]  Mats Brorsson,et al.  Noodle: A Heuristic Algorithm for Task Scheduling in MPSoC Architectures , 2014, 2014 17th Euromicro Conference on Digital System Design.

[12]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[13]  Cécile Belleudy,et al.  Hybrid power management in real time embedded systems: an interplay of DVFS and DPM techniques , 2011, Real-Time Systems.

[14]  Ben H. H. Juurlink,et al.  Leakage-Aware Multiprocessor Scheduling , 2009, J. Signal Process. Syst..

[15]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[16]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[17]  Emmanuel Jeannot,et al.  Triplet: A clustering scheduling algorithm for heterogeneous systems , 2001, Proceedings International Conference on Parallel Processing Workshops.

[18]  Leonel Sousa,et al.  Communication contention in task scheduling , 2005, IEEE Transactions on Parallel and Distributed Systems.

[19]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[20]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[21]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[22]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[23]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[24]  Christoforos E. Kozyrakis Advancing computer systems without technology progress , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[25]  Minhaj Ahmad Khan,et al.  Scheduling for heterogeneous Systems using constrained critical paths , 2012, Parallel Comput..

[26]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[27]  Ishfaq Ahmad,et al.  Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors , 2004, Cluster Computing.

[28]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[29]  Oliver Sinnen,et al.  Scheduling task graphs optimally with A* , 2010, The Journal of Supercomputing.

[30]  Füsun Özgüner,et al.  Parallelizing Existing Applications in a Distributed Heterogeneous Environment , 1995 .

[31]  Henri Casanova,et al.  From Heterogeneous Task Scheduling to Heterogeneous Mixed Parallel Scheduling , 2004, Euro-Par.

[32]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .