A survey of pipelined workflow scheduling: Models and algorithms

A large class of applications need to execute the same workflow on different datasets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task, data, pipelined, and/or replicated parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors, or optimization goals. This article surveys the field by summing up and structuring known results and approaches.

[1]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[2]  Ed F. Deprettere,et al.  Daedalus: Toward composable multimedia MP-SoC design , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[3]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[4]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[5]  Randeep Bhatia,et al.  Book review: Approximation Algorithms for NP-hard Problems. Edited by Dorit S. Hochbaum (PWS, 1997) , 1998, SIGA.

[6]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[7]  Viktor K. Prasanna,et al.  Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[8]  Joel H. Saltz,et al.  A Duplication Based Algorithm for Optimizing Latency Under Throughput Constraints for Streaming Workflows , 2008, 2008 37th International Conference on Parallel Processing.

[9]  Yves Robert,et al.  Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[10]  Manish Parashar,et al.  Understanding the Behavior and Performance of Non-blocking Communications in MPI , 2004, Euro-Par.

[11]  Yolanda Gil,et al.  Workflow Management in GriPhyN" in Grid Resource Management J , 2003 .

[12]  Anthony Rowe,et al.  The discovery net system for high throughput bioinformatics , 2003, ISMB.

[13]  Pramod K. Varshney,et al.  Design, implementation and evaluation of parallel pipelined STAP on parallel computers , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[14]  T. C. Edwin Cheng,et al.  Complexity of cyclic scheduling problems: A state-of-the-art survey , 2010, Comput. Ind. Eng..

[15]  Tomàs Margalef,et al.  Dynamic Pipeline Mapping (DPM) , 2008, Euro-Par.

[16]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..

[17]  Dharma P. Agrawal,et al.  Scheduling of periodic time critical applications for pipelined execution on heterogeneous systems , 2001, International Conference on Parallel Processing, 2001..

[18]  Arthur W. Toga,et al.  Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment , 2008, IPAW.

[19]  Yves Robert,et al.  On the Complexity of Mapping Linear Chain Applications onto Heterogeneous Platforms , 2009, Parallel Process. Lett..

[20]  Rajesh Babu Prathipati Energy efficient scheduling techniques for real-time embedded systems , 2004 .

[21]  Rajeev Motwani,et al.  Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism , 1994, VLDB.

[22]  Yves Robert,et al.  Performance and energy optimization of concurrent pipelined applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[23]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[24]  Sudsanguan Ngamsuriyaroj,et al.  Placing pipeline stages on a Grid: Single path and multipath pipeline execution , 2010, Future Gener. Comput. Syst..

[25]  David B. Shmoys,et al.  Using dual approximation algorithms for scheduling problems: Theoretical and practical results , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[26]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[27]  Rajeev Motwani,et al.  Scheduling problems in parallel query optimization , 1995, PODS '95.

[28]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[29]  Jean-Charles Billaut,et al.  Multicriteria Scheduling Problems , 2003 .

[30]  Vincent T'Kindt,et al.  Counting and enumeration complexity with application to multicriteria scheduling , 2007, Ann. Oper. Res..

[31]  Anne Benoit,et al.  Mapping filtering streaming applications with communication costs , 2009, SPAA '09.

[32]  Martin Gairing,et al.  A faster combinatorial approximation algorithm for scheduling unrelated parallel machines , 2005, Theor. Comput. Sci..

[33]  Alessandro Agnetis,et al.  Scheduling Problems with Two Competing Agents , 2004, Oper. Res..

[34]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[35]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[36]  Yves Robert,et al.  Computing the Throughput of Replicated Workflows on Heterogeneous Platforms , 2009, ICPP.

[37]  Yves Robert,et al.  Multi-criteria scheduling of pipeline workflows , 2007, 2007 IEEE International Conference on Cluster Computing.

[38]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[39]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[40]  Hiroto Yasuura,et al.  Software Energy Reduction Techniques for Variable-Voltage Processors , 2001, IEEE Des. Test Comput..

[41]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[42]  Mihalis Yannakakis,et al.  On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[43]  Denis Trystram,et al.  Fault tolerance and availability awarness in computational grids , 2009 .

[44]  Yves Robert,et al.  Introduction to Scheduling , 2009, CRC computational science series.

[45]  Joel H. Saltz,et al.  Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..

[46]  Füsun Özgüner,et al.  Precedence-Constrained Task Allocation onto Point-to-Point Networks for Pipelined Execution , 1999, IEEE Trans. Parallel Distributed Syst..

[47]  Jun Kong,et al.  Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development , 2009, Pattern Recognit..

[48]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[49]  Peter Brucker,et al.  Scheduling Algorithms , 1995 .

[50]  David M. Nicol,et al.  Optimal Processor Assignment for a Class of Pipelined Computations , 1994, IEEE Trans. Parallel Distributed Syst..

[51]  Miron Livny,et al.  Condor and the Grid , 2003 .

[52]  Mohammad Ashraf Iqbal Approximate algorithms for partitioning problems , 2005, International Journal of Parallel Programming.

[53]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[54]  Jan Karel Lenstra,et al.  Approximation algorithms for scheduling unrelated parallel machines , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[55]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[56]  Yves Robert,et al.  Scheduling algorithms for linear workflow optimization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[57]  Eric Sanlaville,et al.  Sensitivity analysis of tree scheduling on two machines with communication delays , 2004, Parallel Comput..

[58]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[59]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[60]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[61]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[62]  Denis Trystram,et al.  A New Genetic Convex Clustering Algorithm for Parallel Time Minimization with Large Communication Delays , 2005, PARCO.

[63]  KwokYu-Kwong,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999 .

[64]  Gregor von Laszewski,et al.  Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[65]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[66]  Wagner Meira,et al.  Scheduling data flow applications using linear programming , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[67]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[68]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[69]  David M. Nicol,et al.  Rectilinear Partitioning of Irregular Data Parallel Computations , 1994, J. Parallel Distributed Comput..

[70]  Ümit V. Çatalyürek,et al.  A component-based framework for the Cell Broadband Engine , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[71]  Jaspal Subhlok,et al.  Optimal latency-throughput tradeoffs for data parallel pipelines , 1996, SPAA '96.

[72]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[73]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[74]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[75]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[76]  Miron Livny,et al.  Condor: a distributed job scheduler , 2001 .

[77]  Jan Jonsson,et al.  Real-time scheduling for pipelined execution of data flow graphs on a realistic multiprocessor architecture , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[78]  Fernando Guirado,et al.  Exploiting Throughput for Pipeline Execution in Streaming Image Processing Applications , 2006, Euro-Par.

[79]  Yves Robert,et al.  Bi-criteria Pipeline Mappings for Parallel Image Processing , 2008, ICCS.

[80]  Yijie Han,et al.  Mapping a Chain Task to Chained Processors , 1992, Inf. Process. Lett..

[81]  Denis Trystram,et al.  Reliability versus performance for critical applications , 2009, J. Parallel Distributed Comput..

[82]  Yves Robert,et al.  Mapping pipeline skeletons onto heterogeneous platforms , 2007, J. Parallel Distributed Comput..

[83]  Joel H. Saltz,et al.  Supporting data intensive applications in a heterogeneous environment , 2001 .

[84]  Ümit V. Çatalyürek,et al.  Investigating the use of GPU-accelerated nodes for SAR image formation , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[85]  Jaspal Subhlok,et al.  Optimal mapping of sequences of data parallel tasks , 1995, PPOPP '95.

[86]  Joel H. Saltz,et al.  Toward Optimizing Latency Under Throughput Constraints for Application Workflows on Clusters , 2007, Euro-Par.

[87]  Thomas Fahringer,et al.  Teuta: Tool Support for Performance Modeling of Distributed and Parallel Applications , 2004, International Conference on Computational Science.

[88]  Anand Sivasubramaniam,et al.  A Pipeline-Based Approach for Scheduling Video Processing Algorithms on NOW , 2003, IEEE Trans. Parallel Distributed Syst..

[89]  Pramod K. Varshney,et al.  Design, implementation and evaluation of parallel pipelined STAP on parallel computers , 2000, IEEE Trans. Aerosp. Electron. Syst..

[90]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[91]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[92]  David B. Shmoys,et al.  Using dual approximation algorithms for scheduling problems: practical and theoretical results , 1987 .

[93]  UmaMaheswari Devi Scheduling Recurrent Precedence-Constrained Task Graphs on a Symmetric Shared-Memory Multiprocessor , 2009, Euro-Par.

[94]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[95]  Viktor K. Prasanna,et al.  A Mapping Methodology for Designing Software Task Pipelines for Embedded Signal Processing , 1998, IPPS/SPDP Workshops.

[96]  Paul M. Chau,et al.  Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems , 1995, IEEE Trans. Signal Process..

[97]  Bjørn Olstad,et al.  Efficient Partitioning of Sequences , 1995, IEEE Trans. Computers.

[98]  Francine Berman,et al.  The GrADS Project: Software Support for High-Level Grid Application Development , 2001, Int. J. High Perform. Comput. Appl..

[99]  Yves Robert,et al.  Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows , 2007, 2007 IEEE International Conference on Cluster Computing.

[100]  Wagner Meira,et al.  Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model , 2008, 2008 37th International Conference on Parallel Processing.

[101]  Yves Robert,et al.  Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder) , 2009, Int. J. High Perform. Comput. Appl..

[102]  Andrew A. Chien,et al.  A heuristic algorithm for mapping communicating tasks on heterogeneous resources , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[103]  Yves Robert,et al.  Mapping Linear Workflows with Computation/Communication Overlap , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[104]  Cevdet Aykanat,et al.  One-dimensional partitioning for heterogeneous systems: Theory and practice , 2008, J. Parallel Distributed Comput..

[105]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[106]  Victor J. Rayward-Smith,et al.  UET scheduling with unit interprocessor communication delays , 1987, Discret. Appl. Math..

[107]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[108]  Kirk Pruhs,et al.  Speed scaling to manage energy and temperature , 2007, JACM.

[109]  Denis Trystram,et al.  A new clustering algorithm for large communication delays , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[110]  U. Ramachandran,et al.  Scheduling Constrained Dynamic Applications on Clusters , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[111]  Oscar H. Ibarra,et al.  Bounds for LPT Schedules on Uniform Processors , 1977, SIAM J. Comput..

[112]  David B. Shmoys,et al.  A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach , 1988, SIAM J. Comput..

[113]  Joel H. Saltz,et al.  Executing Multiple Pipelined Data Analysis Operations in the Grid , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[114]  Yves Robert,et al.  Multi-criteria mapping techniques for pipeline workflows on heterogeneous platforms , 2009 .

[115]  Yolanda Gil,et al.  Workflow management in GriPhyN , 2004 .

[116]  Y. Gil,et al.  A Knowledge-Based Approach to Interactive Workflow Composition , 2004 .

[117]  Krzysztof Rzadca,et al.  Multi-Objective Scheduling , 2009, Introduction to Scheduling.

[118]  Yves Robert,et al.  Optimizing latency and reliability of pipeline workflow applications , 2007, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[119]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[120]  Edward A. Lee,et al.  Compile-time scheduling of dynamic constructs in dataflow program graphs , 1997 .

[121]  Jan Karel Lenstra,et al.  Approximation algorithms for scheduling unrelated parallel machines , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[122]  Tor Sørevik,et al.  Load balancing and OpenMP implementation of nested parallelism , 2005, Parallel Comput..

[123]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[124]  Cevdet Aykanat,et al.  Fast optimal load balancing algorithms for 1D partitioning , 2004, J. Parallel Distributed Comput..

[125]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[126]  Martin Gairing,et al.  A faster combinatorial approximation algorithm for scheduling unrelated parallel machines , 2007, Theor. Comput. Sci..

[127]  Fernando Guirado,et al.  Optimizing Latency under Throughput Requirements for Streaming Applications on Cluster Execution , 2005, 2005 IEEE International Conference on Cluster Computing.

[128]  Rainer Leisten,et al.  Multi-Objective Scheduling , 2014 .

[129]  Xingfu Wu,et al.  Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications , 2003, PERV.

[130]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[131]  Philippe Chrétienne Task scheduling with interprocessor communication delays , 1992 .

[132]  Anne Benoit,et al.  Scheduling Pipelined Applications: Models, Algorithms and Complexity , 2009 .

[133]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[134]  Shahid H. Bokhari Partitioning Problems in Parallel, Pipelined, and Distributed Computing , 1988, IEEE Trans. Computers.

[135]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[136]  Hong Linh Truong,et al.  ASKALON: a tool set for cluster and Grid computing: Research Articles , 2005 .

[137]  Joel H. Saltz,et al.  Optimizing latency and throughput of application workflows on clusters , 2011, Parallel Comput..