An approach for optimizing latency under throughput constraints for application workflows on clus

In many application domains, it is desirable to meet some user-defined performance requirement while minimizing resource usage and optimizing additional performance parameters. For example, application workflows with real-time constraints may have strict throughput requirements and desire a low latency or response-time. The structure of these workflows can be represented as directed acyclic graphs of coarse-grained application tasks with data dependences. In this paper, we develop a novel mapping and scheduling algorithm that minimizes the latency of workflows that act on a stream of input data, while satisfying throughput requirements. The algorithm employs pipelined parallelism and intelligent clustering and replication of tasks to meet throughput requirements. Latency is minimized by exploiting task parallelism and reducing communication overheads. Evaluation using synthetic benchmarks and application task graphs shows that our algorithm 1) consistently meets throughput requirements, even when other existing schemes fail, 2) produces lower-latency schedules, and 3) results in lesser resource usage.

[1]  Teodor Gabriel Crainic,et al.  Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems , 2006, Comput. Oper. Res..

[2]  Jaspal Subhlok,et al.  Optimal latency-throughput tradeoffs for data parallel pipelines , 1996, SPAA '96.

[3]  Dharma P. Agrawal,et al.  Scheduling pipelined communication in distributed memory multiprocessors for real-time applications , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[4]  Füsun Özgüner,et al.  Precedence-Constrained Task Allocation onto Point-to-Point Networks for Pipelined Execution , 1999, IEEE Trans. Parallel Distributed Syst..

[5]  Tarak Gandhi,et al.  Real-time obstacle detection system for high speed civil transport supersonic aircraft , 2000, Proceedings of the IEEE 2000 National Aerospace and Electronics Conference. NAECON 2000. Engineering Tomorrow (Cat. No.00CH37093).

[6]  Viktor K. Prasanna,et al.  A Mapping Methodology for Designing Software Task Pipelines for Embedded Signal Processing , 1998, IPPS/SPDP Workshops.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Joel H. Saltz,et al.  Executing Multiple Pipelined Data Analysis Operations in the Grid , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Yves Robert,et al.  Mapping pipeline skeletons onto heterogeneous platforms , 2007, J. Parallel Distributed Comput..

[10]  Joel H. Saltz,et al.  Imaging and visual analysis - Large image correction and warping in a cluster environment , 2006, SC.

[11]  Umakishore Ramachandran,et al.  Streamline: a scheduling heuristic for streaming applications on the grid , 2006, Electronic Imaging.

[12]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[13]  Pramod K. Varshney,et al.  Design, implementation and evaluation of parallel pipelined STAP on parallel computers , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[14]  Fernando Guirado,et al.  Optimizing Latency under Throughput Requirements for Streaming Applications on Cluster Execution , 2005, 2005 IEEE International Conference on Cluster Computing.

[15]  Jan Jonsson,et al.  Real-time scheduling for pipelined execution of data flow graphs on a realistic multiprocessor architecture , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[17]  Anand Sivasubramaniam,et al.  A Pipeline-Based Approach for Scheduling Video Processing Algorithms on NOW , 2003, IEEE Trans. Parallel Distributed Syst..