Evaluating CP Techniques to Plan Dynamic Resource Provisioning in Distributed Stream Processing

A growing number of applications require continuous processing of high-throughput data streams, e.g., financial analysis, network traffic monitoring, or big data analytics. Performing these analyses by using Distributed Stream Processing Systems (DSPSs) in large clusters is emerging as a promising solution to address the scalability challenges posed by these kind of scenarios. Yet, the high time-variability of stream characteristics makes it very inefficient to statically allocate the data-center resources needed to guarantee application Service Level Agreements (SLAs) and calls for original, dynamic, and adaptive resource allocation strategies. In this paper we analyze the problem of planning adaptive replication strategies for DSPS applications under the challenging assumption of minimal statistical knowledge of input characteristics. We investigate and evaluate how different CP techniques can be employed, and quantitatively show how different alternatives offer different trade-offs between problem solution time and stream processing runtime cost through experimental results over realistic testbeds.

[1]  Kun-Lung Wu,et al.  COLA: Optimizing Stream Processing Applications via Graph Partitioning , 2009, Middleware.

[2]  Laurence A. Wolsey,et al.  Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, 4th International Conference, CPAIOR 2007, Brussels, Belgium, May 23-26, 2007, Proceedings , 2007, CPAIOR.

[3]  Vana Kalogeraki,et al.  RADAR: Adaptive Rate Allocation in Distributed Stream Processing Systems under Bursty Workloads , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[4]  Paolo Bellavista,et al.  Adaptive Fault-Tolerance for Dynamic Resource Provisioning in Distributed Stream Processing Systems , 2014, EDBT.

[5]  Paul Shaw,et al.  Using Constraint Programming and Local Search Methods to Solve Vehicle Routing Problems , 1998, CP.

[6]  Navendu Jain,et al.  Adaptive Control of Extreme-scale Stream Processing Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[7]  D. Cobb,et al.  Descriptor variable systems and optimal state regulation , 1983 .

[8]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[9]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[10]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[11]  Ying Xing,et al.  Providing resiliency to load variations in distributed stream processing , 2006, VLDB.

[12]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[13]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Karsten Schwan,et al.  Distributed Stream Management using Utility-Driven Self-Adaptive Middleware , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[15]  Deepak S. Turaga,et al.  Design principles for developing stream processing applications , 2010 .

[16]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[17]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Michele Lombardi,et al.  Allocation and scheduling of Conditional Task Graphs , 2010, Artif. Intell..

[19]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[20]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I , 2006, OTM Conferences.

[21]  Paolo Bellavista,et al.  Dynamic datacenter resource provisioning for high-performance distributed stream processing with adaptive fault-tolerance , 2013, MiddlewareDPT '13.

[22]  Kun-Lung Wu,et al.  A code generation approach to optimizing high-performance distributed data stream processing , 2009, CIKM.

[23]  Principles and Practice of Constraint Programming — CP98 , 1999, Lecture Notes in Computer Science.

[24]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[25]  Pascal Van Hentenryck,et al.  Optimal Deployment of Eventually-Serializable Data Services , 2008, CPAIOR.

[26]  Philippe Laborie,et al.  IBM ILOG CP Optimizer for Detailed Scheduling Illustrated on Three Problems , 2009, CPAIOR.