Mapping filtering streaming applications with communication costs

In this paper, we explore the problem of mapping filtering streaming applications on large-scale homogeneous platforms, with a particular emphasis on communication models and their impact. Filtering application are streaming applications where each node also has a selectivity which either increases or decreases the size of its input data set. This selectivity makes the problem of scheduling these applications more challenging than the more studied problem of scheduling "non-filtering" streaming workflows. We identify three significant realistic communication models. For each of them, we address the complexity of the following important problems: Given an execution graph, how can one compute the period and latency? A solution to this problem is an operation list which provides the time-steps at which each computation and each communication occurs in the system. Given a filtering workflow problem, how can one compute the schedule that minimizes the period or latency? A solution to this problem requires generating both the execution graph and the associated operation list. Altogether, with three models, two problems and two objectives, we present 12 complexity results, thereby providing solid theoretical foundations for the study of filtering streaming applications.

[1]  Athman Bouguettaya,et al.  Query Processing and Optimization on the Web , 2004, Distributed and Parallel Databases.

[2]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[3]  Joel H. Saltz,et al.  Optimizing latency and throughput of application workflows on clusters , 2011, Parallel Comput..

[4]  Alessandro Agnetis,et al.  Sequencing unreliable jobs on parallel machines , 2009, J. Sched..

[5]  Yves Robert,et al.  Mapping Filter Services on Heterogeneous Platforms , 2008 .

[6]  Han Hoogeveen,et al.  Minimizing Makespan in a Two-Machine Flow Shop with Delays and Unit-Time Operations is NP-Hard , 2004, J. Sched..

[7]  Andrew A. Chien,et al.  A heuristic algorithm for mapping communicating tasks on heterogeneous resources , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[8]  Daniela Florescu,et al.  XL: a platform for web services , 2002, SIGMOD '02.

[9]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..

[10]  Chase Qishi Wu,et al.  Supporting Distributed Application Workflows in Heterogeneous Computing Environments , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[11]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[12]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[13]  S. Sitharama Iyengar,et al.  Self-Adaptive Configuration of Visualization Pipeline Over Wide-Area Networks , 2008, IEEE Transactions on Computers.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Jennifer Widom,et al.  Query optimization over web services , 2006, VLDB.

[16]  U. Srivastava,et al.  Ordering Pipelined Query Operators with Precedence Constraints , 2005 .

[17]  Yves Robert,et al.  Mapping pipeline skeletons onto heterogeneous platforms , 2007, J. Parallel Distributed Comput..

[18]  Joel H. Saltz,et al.  An approach for optimizing latency under throughput constraints for application workflows on clus , 2007 .

[19]  Viktor K. Prasanna,et al.  Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[20]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[21]  Surajit Chaudhuri,et al.  Optimization of queries with user-defined predicates , 1996, TODS.