Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments

Supporting high-performance data-intensive computing pipelines in wide-area networks is crucial for enabling large-scale distributed scientific applications that require minimizing end-to-end delay for single-input applications or maximizing frame rate for streaming applications. We formulate and categorize the data-intensive computing pipeline mapping problems into six classes with two optimization objectives, i.e. minimum end-to-end delay and maximum frame rate, and three network constraints, i.e. no, contiguous, and arbitrary node reuse. We design a dynamic programming-based optimal solution to the problem of minimum end-to-end delay with arbitrary node reuse and prove the NP-completeness of the rest five problems, for each of which, a heuristic algorithm based on a similar optimization procedure is proposed. These heuristics are implemented and tested on a large set of simulated pipelines and networks of various scales and their performance superiority is illustrated by extensive simulation results in comparison with existing methods.

[1]  Liang Chen,et al.  Resource allocation in a middleware for streaming data , 2004, MGC '04.

[2]  Jay K. Strosnider,et al.  Distributed pipeline scheduling: end-to-end analysis of heterogeneous, multi-resource real-time systems , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[3]  Behrooz Shirazi,et al.  Analysis and Evaluation of Heuristic Methods for Static Task Scheduling , 1990, J. Parallel Distributed Comput..

[4]  Assaf Schuster,et al.  A scheduling framework for large-scale, parallel, and topology-aware applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Yves Robert,et al.  Mapping pipeline skeletons onto heterogeneous platforms , 2007, J. Parallel Distributed Comput..

[6]  Chase Qishi Wu,et al.  Optimizing network performance of computing pipelines in distributed environments , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[7]  John E. Hopcroft,et al.  The Directed Subgraph Homeomorphism Problem , 1978, Theor. Comput. Sci..

[8]  Chase Qishi Wu,et al.  Optimizing End-to-end Performance of Distributed Applications with Linear Computing Pipelines , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[9]  Nirwan Ansari,et al.  Finding all hops shortest paths , 2004, IEEE Communications Letters.

[10]  On the Computational Complexity and Effectiveness of N-hub Shortest-Path Routing , 2004, INFOCOM.

[11]  Ariel Orda,et al.  Computing shortest paths for any number of hops , 2002, TNET.

[12]  S. Sitharama Iyengar,et al.  On transport daemons for small collaborative applications over wide-area networks , 2005, PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005..

[13]  K. Vairavan,et al.  A Statistical Study of the Performance of a Task Scheduling Algorithm , 1983, IEEE Transactions on Computers.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Jake K. Aggarwal,et al.  A Generalized Scheme for Mapping Parallel Algorithms , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[17]  Tilman Wolf,et al.  Configuring sessions in programmable networks , 2003, Comput. Networks.

[18]  Ying Zhu,et al.  Overlay Networks with Linear Capacity Constraints , 2008, IEEE Trans. Parallel Distributed Syst..

[19]  Ceyda Oguz,et al.  Performance of local search heuristics on scheduling a class of pipelined multiprocessor tasks , 2005, Comput. Electr. Eng..

[20]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[21]  Tao Yang,et al.  A Comparison of Clustering Heuristics for Scheduling Directed Acycle Graphs on Multiprocessors , 1992, J. Parallel Distributed Comput..

[22]  Khaled M. F. Elsayed HCASP: A hop-constrained adaptive shortest-path algorithm for routing bandwidth-guaranteed tunnels in MPLS networks , 2004, Proceedings. ISCC 2004. Ninth International Symposium on Computers And Communications (IEEE Cat. No.04TH8769).