Adaptive input admission and management for parallel stream processing

In this paper, we propose a framework for adaptive admission control and management of a large number of dynamic input streams in parallel stream processing engines. The framework takes as input any available information about input stream behaviors and the requirements of the query processing layer, and adaptively decides how to adjust the entry points of streams to the system. As the optimization decisions propagate early from input management layer to the query processing layer, the size of the cluster is minimized, the load balance is maintained, and latency bounds of queries are met in a more effective and timely manner. Declarative integration of external meta-data about data sources makes the system more robust and resource-efficient. Additionally, exploiting knowledge about queries moves data partitioning to the input management layer, where better load balance for query processing can be achieved. We implemented these techniques as a part of the Borealis stream processing system and conducted experiments showing the performance benefits of our framework.

[1]  Tore Risch,et al.  Customizable Parallel Execution of Scientific Stream Queries , 2005, VLDB.

[2]  Patrick Valduriez,et al.  StreamCloud: A Large Scale Data Streaming System , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[3]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[4]  Alexandre M. Bayen,et al.  Evaluation of traffic data obtained via GPS-enabled mobile phones: The Mobile Century field experiment , 2009 .

[5]  Nesime Tatbul,et al.  Changing flights in mid-air: a model for safely modifying continuous queries , 2011, SIGMOD '11.

[6]  Pierre America,et al.  Parallel Database Systems , 1991 .

[7]  Theodore Johnson,et al.  Query-Aware Partitioning for Monitoring Massive Network Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Raghu Ramakrishnan,et al.  Feeding frenzy: selectively materializing users' event feeds , 2010, SIGMOD Conference.

[9]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[10]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[11]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[12]  Chris Chatfield,et al.  Holt‐Winters Forecasting: Some Practical Issues , 1988 .

[13]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[14]  Kirk Pruhs,et al.  Admission control mechanisms for continuous queries in the cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Edward G. Coffman,et al.  Approximation algorithms for bin packing: a survey , 1996 .

[17]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[18]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[19]  Tore Risch,et al.  Massive scale-out of expensive continuous queries , 2011, Proc. VLDB Endow..

[20]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Joseph M. Hellerstein,et al.  Flux: a mechanism for building robust, scalable dataflows , 2004 .

[22]  Sandra Geisler,et al.  A data stream-based evaluation framework for traffic information systems , 2010, IWGS '10.

[23]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[24]  Nesime Tatbul,et al.  Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams , 2011 .

[25]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Donald F. Towsley,et al.  Distributed Resource Management and Admission Control of Stream Processing Systems with Max Utility , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[27]  Richard E. Korf,et al.  A new algorithm for optimal bin packing , 2002, AAAI/IAAI.

[28]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.