Static task partitioning techniques for parallel applications on heterogeneous processors

A critical factor contributing to the efficiency of execution of parallel applications on parallel computing resources is the method chosen to map and schedule the tasks of the parallel application. This problem, often referred to as the DAG scheduling problem, of statically mapping and scheduling a weighted directed acyclic graph (DAG) to a set of heterogeneous processors to minimize the completion time (makespan) has been extensively studied. It remains intractable even with severe assumptions applied to the task and machine models. This thesis tackles two challenges faced by scheduling algorithms when mapping and scheduling tasks onto heterogeneous processors: (1) the execution time of a task on heterogeneous processors is not well represented in the conventional application DAGs (2) critical-path is poorly defined in the presence of communication and such heterogeneity. We address the first challenge by adopting a better representation model for the application DAGs and the resource graphs that enable processors to be selectively faster for certain kinds of tasks. We propose, design and evaluate a simulated annealing based task mapping algorithm that exploits this representation and maps applications with task and data level parallelism onto a set of heterogeneous processors. The novelty of this algorithm lies in guiding the random search using one of the systemic parameters called temperature. We observe significant improvements, in quality of the solutions for real world benchmarks, when compared against other well established task mapping algorithms. As an additional contribution, we show that similar meta-heuristic techniques are effective for partitioning road networks for distributed simulation. In the context of the second related challenge of finding critical paths on a set of heterogeneous processors, existing solutions to calculate the critical path use mean values of computation and communication. In the presence of heterogeneity and communication costs these methods of calculating the critical path are rendered 8

[1]  Atakan Dogan,et al.  Genetic Algorithm Based Scheduling of Meta-Tasks with Stochastic Execution Times in Heterogeneous Computing Systemst1 , 2004, Cluster Computing.

[2]  Edward G. Coffman,et al.  Optimal Preemptive Scheduling on Two-Processor Systems , 1969, IEEE Transactions on Computers.

[3]  S. Ranka,et al.  Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors , 1992, Proceedings Supercomputing '92.

[4]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[5]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[6]  Nawwaf N. Kharma,et al.  A high performance algorithm for static task scheduling in heterogeneous distributed computing systems , 2008, J. Parallel Distributed Comput..

[7]  K. G. Lockyer An introduction to critical path analysis , 1965 .

[8]  Rizos Sakellariou,et al.  DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[9]  Füsun Özgüner,et al.  Parallelizing Existing Applications in a Distributed Heterogeneous Environment , 1995 .

[10]  Richard Zurawski,et al.  Embedded Systems Handbook , 2004 .

[11]  Edward G. Coffman,et al.  An Application of Bin-Packing to Multiprocessor Scheduling , 1978, SIAM J. Comput..

[12]  Emmanuel Jeannot,et al.  Robust task scheduling in non-deterministic heterogeneous computing systems , 2006, 2006 IEEE International Conference on Cluster Computing.

[13]  Wayne Luk,et al.  An energy and power consumption analysis of FPGA routing architectures , 2009, 2009 International Conference on Field-Programmable Technology.

[14]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[15]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[16]  Salim Hariri,et al.  Task scheduling algorithms for heterogeneous processors , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[17]  Marc Pouzet,et al.  Synchronous Kahn networks , 1996, ICFP '96.

[18]  Ladislau Bölöni,et al.  Robust scheduling of metaprograms , 2002 .

[19]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[20]  Ishfaq Ahmad,et al.  A New Approach to Scheduling Parallel Programs Using Task Duplication , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[21]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[22]  Dinesh Bhatia,et al.  Temporal Partitioning and Scheduling Data Flow Graphs for Reconfigurable Computers , 1999, IEEE Trans. Computers.

[23]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[24]  Lori A. Clarke,et al.  Task interaction graphs for concurrency analysis , 1989, ICSE '89.

[25]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[26]  Jing-Chiou Liou,et al.  A comparison of general approaches to multiprocessor scheduling , 1997, Proceedings 11th International Parallel Processing Symposium.

[27]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[28]  H. Markram The Blue Brain Project , 2006, Nature Reviews Neuroscience.

[29]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[30]  H. Holden,et al.  A mathematical model of traffic flow on a network of unidirectional roads , 1995 .

[31]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[32]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[33]  Jason Cong,et al.  Low-power FPGA using pre-defined dual-Vdd/dual-Vt fabrics , 2004, FPGA '04.

[34]  Arjan J. C. van Gemund,et al.  Fast and effective task scheduling in heterogeneous systems , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[35]  Edward A. Lee,et al.  A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[36]  Boontee Kruatrachue,et al.  Grain size determination for parallel processing , 1988, IEEE Software.

[37]  Jack J. Dongarra,et al.  Scheduling workflow applications on processors with different capabilities , 2006, Future Gener. Comput. Syst..

[38]  Nawwaf N. Kharma,et al.  GATS 1.0: a novel GA-based scheduling algorithm for task scheduling on heterogeneous processor nets , 2005, GECCO '05.

[39]  Johanne Cohen,et al.  Analysis of Multi-Organization Scheduling Algorithms , 2010, Euro-Par.

[40]  Ronald L. Graham,et al.  Optimal scheduling for two-processor systems , 1972, Acta Informatica.

[41]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[42]  Daniel Krajzewicz,et al.  SUMO - Simulation of Urban MObility An Overview , 2011 .

[43]  Rizos Sakellariou,et al.  An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm , 2003, Euro-Par.

[44]  Edgar Morgenroth,et al.  Analysis of the Economic Employment and Social Profile of the Greater Dublin Region , 2005 .

[45]  Kemal Efe,et al.  Task scheduling with and without communication delays: A unified approach , 1996 .

[46]  Jeffrey D. Ullman,et al.  Polynomial complete scheduling problems , 1973, SOSP '73.

[47]  Alexey L. Lastovetsky,et al.  High Performance Heterogeneous Computing , 2009, Wiley series on parallel and distributed computing.

[48]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[49]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[50]  Erik J. Gilbert An Investigation of the Partitioning of Algorithms Across an MIMD Computing System ( XMAP-I ) , .

[51]  Michael D. Vose,et al.  The simple genetic algorithm - foundations and theory , 1999, Complex adaptive systems.

[52]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[53]  Richard N. Taylor,et al.  A general-purpose algorithm for analyzing concurrent programs , 1983, CACM.

[54]  Katarzyna Keahey Cloud Computing for Science , 2009, SSDBM.

[55]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[56]  Örjan Ekeberg,et al.  Brain-scale simulation of the neocortex on the IBM Blue Gene/L supercomputer , 2008, IBM J. Res. Dev..

[57]  Sajal K. Das,et al.  MaTCH: mapping data-parallel tasks on a heterogeneous computing platform using the cross-entropy heuristic , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[58]  S. N. Sivanandam,et al.  Genetic Algorithm Optimization Problems , 2008 .

[59]  Brice Goglin,et al.  ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures , 2010, International Journal of Parallel Programming.

[60]  Samuel Thibault,et al.  Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework , 2007, Euro-Par.

[61]  A. Abraham,et al.  Scheduling jobs on computational grids using a fuzzy particle swarm optimization algorithm , 2010, Future Gener. Comput. Syst..

[62]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[63]  Timo Hämäläinen,et al.  Parameterizing Simulated Annealing for Distributing Task Graphs on Multiprocessor SoCs , 2006, 2006 International Symposium on System-on-Chip.

[64]  Mihalis Yannakakis,et al.  Scheduling Interval-Ordered Tasks , 1979, SIAM J. Comput..

[65]  Richard N. Taylor,et al.  Complexity of analyzing the synchronization structure of concurrent programs , 1983, Acta Informatica.

[66]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[67]  Liam Murphy,et al.  SParTSim: A Space Partitioning Guided by Road Network for Distributed Traffic Simulations , 2012, 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications.

[68]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[69]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[70]  Hao Chen,et al.  Parallel Genetic Simulated Annealing: A Massively Parallel SIMD Algorithm , 1998, IEEE Trans. Parallel Distributed Syst..

[71]  Johanne Cohen,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2011 .

[72]  K. Mani Chandy,et al.  A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.

[73]  Jing-Chiou Liou,et al.  An Efficient Task Clustering Heuristic for Scheduling DAGs on Multiprocessors , 2007 .

[74]  Soonhoi Ha,et al.  A Static Scheduling Heuristic for Heterogeneous Processors , 1996, Euro-Par, Vol. II.

[75]  David Andrew Hornig Automatic partitioning and scheduling on a network of personal computers , 1984 .

[76]  Teodor Gabriel Crainic,et al.  The Generalized Bin Packing Problem , 2012 .

[77]  Jan Janecek,et al.  A simple scheduling heuristic for heterogeneous computing environments , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..

[78]  Mitsuo Gen,et al.  A hybrid genetic and variable neighborhood descent algorithm for flexible job shop scheduling problems , 2008, Comput. Oper. Res..

[79]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[80]  Bernd Burgstaller,et al.  Orchestration by approximation: mapping stream programs onto multicore architectures , 2011, ASPLOS XVI.

[81]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[82]  Ravi Sethi,et al.  Scheduling Graphs on Two Processors , 1976, SIAM J. Comput..

[83]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[84]  Yves Robert,et al.  Parallel Gaussian elimination on an MIMD computer , 1988, Parallel Comput..

[85]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[86]  P. Chitra,et al.  Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system , 2011, Int. J. Inf. Technol. Commun. Convergence.

[87]  Kang G. Shin,et al.  Optimal Task Assignment in Homogeneous Networks , 1997, IEEE Trans. Parallel Distributed Syst..

[88]  Sajal K. Das,et al.  FastMap: a distributed scheme for mapping large scale applications onto computational grids , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[89]  Krzysztof Rzadca,et al.  Cooperation in multi‐organization scheduling , 2009, Concurr. Comput. Pract. Exp..

[90]  José E. Moreira,et al.  Dissecting Cyclops: a detailed analysis of a multithreaded architecture , 2003, CARN.

[91]  Lyes Bouali,et al.  A Hybrid Algorithm for DAG Application Scheduling on Computational Grids , 2015, MSPN.

[92]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[93]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[94]  David Gregg,et al.  An improved simulated annealing heuristic for static partitioning of task graphs onto heterogeneous architectures , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[95]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[96]  Prithviraj Banerjee,et al.  Simultaneous exploitation of task and data parallelism in regular scientific applications , 1996 .

[97]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[98]  Edward A. Lee,et al.  Ptolemy: A Framework for Simulating and Prototyping Heterogenous Systems , 2001, Int. J. Comput. Simul..

[99]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[100]  Carolyn McCreary,et al.  A comparison of heuristics for scheduling DAGs on multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[101]  Scott Pakin,et al.  Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.

[102]  Jiadong Yang,et al.  A heuristic-based hybrid genetic-variable neighborhood search algorithm for task scheduling in heterogeneous multiprocessor system , 2011, Inf. Sci..

[103]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[104]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[105]  David Gregg,et al.  Heterogeneous Multiconstraint Application Partitioner (HMAP) , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[106]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[107]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[108]  Walter H. Kohler,et al.  A Preliminary Evaluation of the Critical Path Method for Scheduling Tasks on Multiprocessor Systems , 1975, IEEE Transactions on Computers.

[109]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[110]  E. Boemo,et al.  Clock gating and clock enable for FPGA power reduction , 2012, 2012 VIII Southern Conference on Programmable Logic.

[111]  Ajith Abraham,et al.  A DISCRETE PARTICLE SWARM OPTIMIZATION APPROACH FOR GRID JOB SCHEDULING , 2009 .

[112]  Mikyung Kang,et al.  Heterogeneous Cloud Computing , 2011, 2011 IEEE International Conference on Cluster Computing.

[113]  L. S. Collaboration,et al.  Einstein@Home search for periodic gravitational waves in early S5 LIGO data , 2009, 0905.1705.

[114]  Behrooz Shirazi,et al.  DFRN: a new approach for duplication based scheduling for distributed memory multiprocessor systems , 1997, Proceedings 11th International Parallel Processing Symposium.

[115]  Ishfaq Ahmad,et al.  Benchmarking the task graph scheduling algorithms , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[116]  Tao Yang,et al.  A Comparison of Clustering Heuristics for Scheduling Directed Acycle Graphs on Multiprocessors , 1992, J. Parallel Distributed Comput..

[117]  Yuehui Chen,et al.  A Task Scheduling Algorithm Based on PSO for Grid Computing , 2008 .

[118]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[119]  Aijia Ouyang,et al.  An Improved Artificial Chemical Reaction Optimization Algorithm for Job Scheduling Problem in Grid Computing Environments , 2015 .

[120]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[121]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.