Optimizing the Reliability of Pipelined Applications under Throughput Constraints

Mapping a pipelined application onto a distributed and parallel platform is a challenging problem. The problem becomes even more difficult when multiple optimization criteria are involved, and when the target resources are heterogeneous (processors and communication links) and subject to failures. This paper investigates the problem of mapping pipelined applications, consisting of a linear chain of stages executed in a pipeline way, onto such platforms. The objective is to optimize the reliability under a performance constraint, i.e., while guaranteeing a threshold throughput. In order to increase reliability, we replicate the execution of stages on multiple processors. We present complexity results, proving that this bi-criteria optimization problem is NP-hard. We then propose some heuristics, and discuss extensive experiments evaluating their performance.

[1]  Alain Girault,et al.  A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate , 2009, IEEE Transactions on Dependable and Secure Computing.

[2]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[3]  Jaspal Subhlok,et al.  Optimal mapping of sequences of data parallel tasks , 1995, PPOPP '95.

[4]  Jaspal Subhlok,et al.  Optimal latency-throughput tradeoffs for data parallel pipelines , 1996, SPAA '96.

[5]  D. Walker,et al.  Patterns and Skeletons for Parallel and Distributed Computing , 2022 .

[6]  Atakan Dogan,et al.  Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[7]  Alain Girault,et al.  A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints , 2004, International Conference on Dependable Systems and Networks, 2004.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Viktor K. Prasanna,et al.  Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[10]  Yves Robert,et al.  Mapping pipeline skeletons onto heterogeneous platforms , 2007, J. Parallel Distributed Comput..

[11]  Valmir Carneiro Barbosa,et al.  An introduction to distributed algorithms , 1996 .

[12]  Amos Fiat,et al.  Making commitments in the face of uncertainty: how to pick a winner almost every time (extended abstract) , 1996, STOC '96.

[13]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[14]  Anne Benoit,et al.  Mapping Pipelined Applications with Replication to Increase Throughput and Reliability , 2010, SBAC-PAD.

[15]  Arnold L. Rosenberg Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload , 2002, IEEE Trans. Parallel Distributed Syst..

[16]  Yves Robert,et al.  Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows , 2007, 2007 IEEE International Conference on Cluster Computing.

[17]  Chase Qishi Wu,et al.  Supporting Distributed Application Workflows in Heterogeneous Computing Environments , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[18]  Joel H. Saltz,et al.  Optimizing execution of component-based applications using group instances , 2002, Future Gener. Comput. Syst..

[19]  Arnold L. Rosenberg,et al.  An Optimal Strategies for Cycle-Stealing in Networks of Workstations , 1997, IEEE Trans. Computers.